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1  Personnel 

Since  its  award,  seven  faculty  members  and  twelve  students  from  the  departments  of  Electrical  Engineering, 
Chemical  Engineering  and  Mathematics  were  funded  by  this  project  (including  one  post-doctoral  student 
and  three  students  funded  by  ASSERT  awards  related  to  this  grant). 

More  specifically,  the  faculty  members  involved  in  this  project  were: 


1.  T.  Cale,  Chemical  Engineering 

2.  P.  Crouch,  Electrical  Engineering 

3.  D.  Ferry,  Electrical  Engineering 
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FAX:  602/965 -  0461  brozver@asuvax.eas.asu.edu 


4.  M.  Kozicki,  Electrical  Engineering 

5.  G.  Raupp,  Chemical  Engineering 

6.  C.  Ringhofer,  Mathematics 

7.  K.  Tsakalisj  Electrical  Engineering 

Students  that  were  supported  by  the  grant  (at  various  stages)  and  their  current  status  is  listed  below: 

1.  R.  Bammi,  Ch.E.,  (T.  Cale),  Left  after  completing  his  M.S.  degree. 

2.  S.-J.  Yang,  Ch.E.,  (T.  Cale),  Left  project. 

3.  M.  Peters,  Ch.E.,  (T.  Cale),  Left  project. 

4.  Y.-L.  Lin,  Ch.E.,  (G.  Raupp,  K.  Tsakalis,  P.  Crouch),  Left  after  completing  his  M.S.  degree,  currently 
with  Texas  Instruments,  Dallas. 

5.  M.  Gobbert,  Math.  (C.  Ringhofer),  Completed  his  PhD,  currently  with  IMA,  Univ.  of  Minnesota. 

6.  S.  Shen,  E.E.,  (P.  Crouch),  Left  after  obtaining  employment  with  Texas  Instruments,  Dallas. 

7.  L.  Song,  E.E.  (K.  Tsakalis),  Continuing  towards  her  Ph.D.  degree  (expected  graduation  during  the 
1996-1997  academic  year) 

8.  P.  Thanikasalam,  E.E.  (D.  Ferry),  Continuing  towards  his  Ph.D.  degree. 

The  following  students  were  supported  by  ASSERT  awards  related  to  the  grant. 

9.  K.  Stoddard,  E.E.,  (P.  Crouch,  K.  Tsakalis,  M.  Kozicki),  Left  after  obtaining  M.S.  degree,  currently 
employed  at  SEMY  Eng.  Inc.,  Phoenix. 

10.  K.  Tracy,  Ch.E.,  (T.  Cale)  Completed  M.S.  degree. 

11.  Delbert  Herald,  E.E.  (M.  Kozicki,  K.  Tsakalis),  replaced  K.  Stoddard  and  currently  continuing  towards 
his  M.S.  degree  (expected  graduation  during  the  1996-1997  academic  year) 

Finally,  J.H.  Park  (Ch.E.)  worked  as  a  post-doctoral  student  with  T.  Cale  on  the  development  of  simu¬ 
lation  test-beds. 
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2  Summary  of  Results 


As  indicated  by  its  title,  the  project  had  three  main  thrust  areas,  namely  modeling,  simulation,  and  control 
of  processes  that  are  encountered  in  Semiconductor  manufacturing  applications.  The  key  common  theme  of 
these  areas  was  the  use  of  detailed,  first-principles,  physical  models  to  predict  the  process  behavior  under 
various  conditions.  Such  an  approach  is  currently  considered  as  an  attractive  alternative  for  empirical  models 
whose  validity  is  limited  by  the  range  of  testing  conditions. 

A  clear  and  impressive  demonstration  of  this  concept  can  be  given  based  on  the  results  of  this  project. 
In  particular,  in  [24-26],  a  detailed  feature-scale  model/simulator  (EVOLVE,  [3])  was  used  as  a  basis  to 
derive  a  simplified  model  and  compute  a  near-optimal  temperature  protocol  for  a  CVD  process  in  trenches, 
that  minimizes  processing  time  subject  to  step-coverage  constraints.  This  protocol  (temperature  trajectory) 
was  validated  by  simulation  against  the  detailed  test-bed  (EVOLVE).  The  basic  ingredient  of  this  protocol, 
computed  by  employing  optimal  control  principles,  was  that  processing  time  can  be  reduced  while  maintaining 
a  given  step  coverage,  by  performing  the  initial  stage  of  the  process  at  high  temperatures  and  reduce  the 
temperature  as  the  deposition  approaches  closure.  A  simpler  version  of  the  same  idea  was  then  tested 
experimentally,  [15-17],  where  an  excellent  agreement  was  found  between  simulated  and  actual  behavior  of 
the  deposition  process.  The  improvement  of  such  a  time-varying  temperature  protocol,  compared  to  the 
customary  constant  processing  conditions,  is  illustrated  in  Fig.  1.  While  further  details  can  be  found  in 
the  related  references,  these  results  demonstrate  that  significant  process  improvement  can  be  obtained  by 
using  non-standard  processing  protocols,  computed  by  means  of  optimization/optimal  control  techniques 
and  detailed,  physically-based  simulation  models.  However,  the  validity  and  confidence  in  such  protocols 
relies  heavily  on  the  validity  of  the  models  and  simulators  for  non-standard  processing  conditions.  Loosely 
speaking,  it  is  expected  that  process  optimization  will,  in  general,  involve  the  use  of  “extreme”  processing 
conditions  for  which  experimental  data  are  unlikely  to  be  available  a  priori.  This  observation  imposes 
serious  limitations  of  the  applicability  of  purely  empirical  models  to  such  a  process  optimization.  On  the 
other  hand,  detailed  first-principles  models  possess  the  ability  to  accurately  predict  the  process  behavior, 
even  under  these  extreme  conditions.  Thus,  the  usefulness  of  detailed,  physically- based  models  is  not  limited 
to  an  abstract  “improvement  of  our  process  understanding”  but  it  has  a  tangible  and  important  role  in 
process  optimization. 

Returning  to  the  project  summary,  our  results  can  be  classified  into  the  three  main  categories  of  research 
thrusts.  This  distinction,  however,  is  not  always  rigid  since  several  of  the  results  contain  components  that 
combine  modeling,  simulation  and  control  concepts. 

Beginning  with  the  modeling  problem,  a  part  of  our  efforts  was  devoted  to  the  dry  oxidation  of  silicon 
and  a  unified  description  of  thin  and  thick  film  oxide  growth  rates  and  interfacial  structure.  The  proposed 
model  invokes  dissociative  chemisorption  in  silicon  at  the  interface  between  the  silicon  dioxide  film  and  the 
substrate.  This  model  supports  the  diffusing  species  in  such  a  process  to  be  molecular  rather  than  atomic 
oxygen  and  predicts  a  self-limiting  oxide  film  thickness  of  0.5-0. 6  nm  and  an  inherent  interfacial  roughness  of 
approximately  one  atomic  diameter  (0.3nm).  Kinetic  rate  equations  developed  with  this  model  were  found 
to  provide  an  excellent  fit  of  experimental  data.  Further  details  are  given  in  publications/reports  [1,2]. 
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Figure  5.52  PRCVD  Trench  sample  5-1. 


Figure  5.53  CRCVD  Trench  sample  6-3. 


Figure  1:  Experimental  comparison  of  blanket  tungsted  deposition  profiles  under  Programmed- rate  and 
Constant-rate  CVD  [17].  For  comparable  deposition  thickness  and  step  coverage,  the  CRCVD  processing 
time  was  465  sec  while  PRCVD  required  only  90  sec.  (80%  time-savings). 
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Next,  regarding  the  simulation  component,  our  efforts  were  focused  on  the  development  of  hybrid  models, 
capable  of  describing  the  chemical  and  physical  process  in  a  single  wafer  reactor,  on  the  reactor  scale  and 
the  feature  scale  simultaneously.  For  this  task,  the  challenge  is  the  great  disparity  of  spatial  scales  between 
reactor-scale  simulators  (CFDSWR,  10  cm,  [4,5])  and  feature-scale  simulators  (EVOLVE,  1  ^m,  [3]).  To 
address  this  problem,  we  have  introduced  a  “mesoscopic-scale  model”  on  the  scale  ranging  from  several 
feature  clusters  (1  mm)  to  one  computer  chip  (1  cm).  This  model  is  capable  of  capturing  feature-to-feature 
and  cluster-to-cluster  effects  that  neither  one  of  the  classical  reactor-scale  and  feature-scale  model  is  capable 
of  representing  due  to  their  typical  length  scales.  The  derivation  of  this  mesoscopic-scale  model  relies  on 
an  asymptotic  analysis  of  the  boundary  layer  and  utilizes  homogenization  techniques.  This  analysis  yields 
a  mathematically  rigorous  derivation  of  the  model  equations  that  provide  a  computationally  manageable 
approach  to  self-consistently  simulate  the  interaction  of  the  gas  flow  on  the  reactor-scale  and  the  global 
evolution  of  the  wafer  surface.  Further  details  are  given  in  publications/reports  [6-11],  containing  results 
for  the  two-  and  three-dimensional  cases  and  demonstrations  illustrating  the  effects  of  cluster  density  on  the 
deposition  process  (SiC>2  from  TEOS)  and  the  effect  of  varying  operating  conditions  on  micro-loading. 

Finally,  in  our  work  related  to  the  control  of  semiconductor  processes,  we  primarily  focused  on  the 
so-called  “outer-loop”  control.  In  the  outer-loop,  the  controller  is  responsible  for  generating  high-level  com¬ 
mands  in  the  form  of  trajectories  to  be  followed  by  certain  on-line  measurable  critical  process  variables,  e.g., 
temperature,  pressure,  flowrates.  Process  optimization  is  performed  at  this  level  by  computing  trajectories 
which  yield  an  optimal  value  for  a  desired  criterion,  e.g.,  minimize  processing  time  and/or  material  con¬ 
sumption,  subject  to  process  quality  constraints  (step  coverage,  uniformity)  and  constraints  imposed  by  the 
trajectory- following  ability  of  the  local  inner  loops  (real-time  controllers,  e.g.,  PIDs).  In  its  simplest  form, 
the  trajectories  are  approximated  by  constants  and,  treating  the  map  from  the  inner-loop  set-points  to  the 
process  outputs  as  a  static  nonlinearity,  the  outer  loop  becomes  the  so-called  “run-to-run”  controller.  In  our 
work  to  date,  we  considered  applications  of  run-to-run  control  in  both  simulation  [21-23]  and  experimental 
[19,20]  environments.  A  key  component  of  our  approach  is  the  use  of  empirical  but  physically-motivated 
models  to  describe  the  process  accurately  (for  control  purposes),  using  relatively  few  adjustable  parameters 
[22,23].  In  this  way,  the  model  parameters  can  be  adjusted  using  a  relatively  small  number  of  data  points 
obtained  from  experiments  or  simulation  of  the  detailed  physically- based  models. 

A  significant  portion  of  our  efforts  was  devoted  to  the  outer-loop  control  of  LPCVD  processes  using  opti¬ 
mal  control  principles.  We  considered  the  processes  of  blanket  tungsten  deposition  and  thermally  activated 
deposition  of  Si02  from  TEOS,  where  we  derived  a  “control  oriented,”  feature-scale  model  based  on  power 
series  expansions.  This  model  was  shown  to  be  adequate  for  the  computation  of  near-optimal  temperature 
trajectories  that  minimized  processing  time  subject  to  step  coverage  constraints  [?,  ?].  The  process  improve¬ 
ment  with  the  resulting  processing  protocols  was  verified  with  simulations  against  an  accurate  feature-scale 
simulation  test-bed  (EVOLVE,  [3])  coupled  with  a  reactor-scale  simulator  (CFDSWR,  [4]).  Preliminary 
experimental  results  in  tungsten  deposition  have  shown  a  similar  process  improvement  with  these  protocols, 
relative  to  constant  processing  conditions  [15-18]. 
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On  the  other  hand,  detailed  simulation  test-beds  of  the  processes  and  associated  equipment,  need  to  be 
developed  simultaneously  with  the  control  oriented  models.  Such  a  concurrent  development  aims  not  only 
to  assess  the  performance  of  the  process  step  during  design  or  modification,  but  also  to  provide  guidance 
in  the  development  of  control  algorithms.  For  example,  a  detailed  physical  model  may  suggest  the  explicit 
introduction  of  certain  terms  in  the  control  oriented  model,  that  captures  the  main  nonlinearities  or  dynamics 
in  the  process  behavior  [22,23].  This  idea  was  used  in  [27]  to  effectively  translate  the  computed  optimal 
processing  conditions  at  the  wafer  surface  (temperature  and  partial  pressures)  into  reactor-scale  manipulated 
variables  (temperature  and  flow  rates).  A  rigorous  experimental  verification  of  the  process  improvement  with 
these  control  algorithms  is  currently  under  investigation  [18]. 

In  conclusion,  our  work  on  this  project  has  addressed  most  of  the  original  objectives  and  objectives  that 
became  important  in  the  course  of  our  research.  Our  results  demonstrated  the  practical  feasibility  and 
benefits  of  a  systematic  and  rigorous  application  of  numerical  analysis  and  control  theoretic  principles  to  the 
modeling,  simulation  and  control  of  semiconductor  manufacturing  processes.  A  by-product,  of  perhaps  equal 
importance,  is  the  stimulation  of  a  closer  collaboration  of  the  PFs  with  the  local  semiconductor  industry 
that  has  resulted  in  several  small  application-oriented  projects,  sponsored  by,  e.g.,  Intel,  Motorola,  Semy. 

3  Publications  and  Reports 

1.  T.  K.  Whidden,  P.  Thanikanasalam,  M.  J.  Rack,  and  D.  K.  Ferry,  “The  Initial  Oxidation  of  Silicon(lOO): 
A  Unified,  Chemical  Model  for  Thin  and  Thick  Oxide  Growth  Rates  and  Interfacial  Structure,”  22nd 
Conference  on  the  Physics  and  Chemistry  of  Semiconductor  Interfaces ,  Scottsdale,  AZ,  January  1995; 
also  in  J.  Vac.  Sci.  Technol.  B ,  13,  4,  1618-1625,  Jul/Aug  1995. 

2.  P.  Thanikanasalam  and  D.  K.  Ferry,  “A  Unified  Chemical  Model  for  Thermal  Oxidation  of  Silicon 
(100)  in  a  Dry  Oxygen  Ambient,”  Technical  Report ,  May  1996. 

3.  T.  S.  Cale,  EVOLVE  f.Ob,  A  low  pressure  deposition  simulator.  August  1994. 

4.  J.H.  Park,  User’s  guide  for  CFDSWR:  Computational  Fluid  Dynamics  of  a  Single  Wafer  Reactor. 
Arizona  State  Univ.  1993. 

5.  M.  Gobbert:  “The  Simulation  Platform  CMFSD,”  Report,  Arizona  State  University)  1993. 

6.  C.  Ringhofer,  “A  New  Expansion  Procedure  to  Generalize  Hydrodynamic  Transport  Models,”  Proc. 
International  Workshop  on  Computational  Electronics ,  Portland  OR,  May  1994. 

7.  M.  Gobbert,  C.  Ringhofer,  “An  Asymptotic  Analysis  for  a  model  of  Chemical  Vapor  Deposition  on  a 
micro  Structured  Surface,”  Submitted  to  SIAM  Journal  on  Applied  Mathematics. 

8.  M.  Gobbert,  T.  Cale,  C.  Ringhofer,  “One  Approach  to  Combining  Equipment  Scale  and  Feature  Scale 
Models,”  Proc.  181th.  meeting  of  the  Electro.  -  Chemical  Society ,  Reno,  Nevada,  March  1995. 
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9.  M.  K.  Gobbert,  T.  S.  Cale  and  C.  Ringhofer,  “The  Combination  of  Equipment  Scale  and  Feature  Scale 
Models  for  Chemical  Vapor  Deposition  via  a  Homogenization  Technique,”  I±th  International  Workshop 
on  Computational  Electronics ,  Tempe,  AZ,  November,  1995. 

10.  M.K.  Gobbert  and  C.  Ringhofer,  “Mesoscopic  Scale  Modeling  of  Microloading  During  LPCVD,”  J. 
Electrochem.  Soc .,  Aug.  1996  (to  appear). 

11.  M.  Gobbert,  A  Homogenization  Technique  for  the  Development  of  Mesoscopic  Scale  Models  for  Chem¬ 
ical  Vapor  Deposition.  Ph.D.  Dissertation,  ASU,  April  1996. 

12.  R.  Bammi,  T.  S.  Cale  and  G.  Grivna,  “Development  of  a  Gate  Metal  Etch  Process  for  Gallium  Arsenide 
Wafers,”  Thin  Solid  Films ,  253,  501,  1994. 

13.  R.  Bammi,  Development  of  a  Gate  Metal  Etch  Process  for  Gallium  Arsenate,  MS  Thesis,  Arizona  State 
University,  Arizona,  April  1994. 
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Deposition  of  Ti-W  Films,”  to  be  presented  at  the  ]±2nd  National  Symposium  and  Topical  Conferences 
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15.  K.  M.  Tracy,  S.  Bolnedi,  T.  S.  Cale,  “Programmed  Rate  Chemical  Vapor  Deposition  of  Blanket  Tung¬ 
sten  Thin  Films,”  to  be  presented  at  the  Proc.  12th  International  VLSI  Multilevel  Interconnection 
Conf ,  T.  Wade  Ed.,  VMIC,  643,  1995. 

16.  K.  M.  Tracy,  S.  Bolnedi,  G.  J.  Leusink  and  T.  S.  Cale,  “Blanket  Tungsten  Film  Deposition  Using 
Programmed  Rate  CVD,”  Advanced  Metallization  and  Interconnect  Systems  for  ULSI  Applications  in 
1995 ,  MRS,  in  press. 

17.  K.M.  Tracy,  Programmed  Rate  Chemical  Vapor  Deposition :  Blanket  Tungsten  Film  Characterization. 
M.S.  Thesis,  ASU,  May  1996. 

18.  J.  Kristof,  K.  Tracy,  L.  Song,  K.  Tsakalis  and  T.  Cale,  “Programmed  Rate  Chemical  Vapor  Deposition 
of  Tungsten,”  to  appear  in  Tech. Con.  ’91,  SRC \  Phoenix,  Sept.  1996. 

19.  K.  Stoddard,  P.  Crouch,  M.  Kozicki,  and  K.  Tsakalis,  “Application  of  Feedforward  and  Adaptive  Feed¬ 
back  Control  to  Semiconductor  Device  Manufacturing,”  Proc.  IEEE  American  Control  Conference , 
892-896,  Baltimore,  MD.,  1994. 

20.  K.  Stoddard.  Application  of  Feed- Forward  and  Adaptive  Feedback  Control  to  Semiconductor  Device 
Processing.  MS  Thesis,  Arizona  State  University,  Arizona,  December  1994. 

21.  K.  S.  Tsakalis  and  L.  Song,  “Set-Membership  Estimation  for  Weakly  Nonlinear  Models:  An  Application 
to  the  Adaptive  Control  of  Semiconductor  Manufacturing  Processes,”  Proc.  IEEE  Conference  on 
Decision  and  Control ,  1066-1071,  Orlando  Florida,  Dec.  1994. 
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22.  T.  Cale,  P.  E.  Crouch,  S.  Shen,  and  K.  S.  Tsakalis,  “A  Simple  Adaptive  Optimization  Algorithm  for 
the  Tungsten  LPCVD  Process,”  Proc.  IEEE ,  American  Control  Conference ,  1294-1298,  Seattle,  1995. 
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4  Appendix 


The  Appendix  contains  copies  of  selected  recent  publications  and  reports  that  provide  a  detailed  description 
of  the  results  mentioned  in  the  Summary.  The  publications  included  here  are: 

[1,2,9,10,11,15,17,18,19,21,22,23,24,25,26,27] 

Copies  of  the  rest  of  the  publications  that  were  supported  by  the  grant  are  available  upon  request. 
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A  model  for  silicon  oxidation  that  invokes  dissociative  chemisorption  of  molecular  oxygen  at  the 
interface  between  silicon  dioxide  and  silicon  is  described.  The  model  accounts  for  a  self-limiting 
oxide  film  thickness  of  0.5 -0.6  nm  (for  oxidations  performed  at  temperatures  sufficient  to  dissociate 
surface  dimers  and  permit  oxygen  penetration  of  the  substrate  beyond  a  single  monolayer  of 
suboxide).  Detailed  examination  of  the  model  suggests  a  mechanism  for  an  inherent  oxide/silicon 
interface  roughness  of  approximately  one  atomic  diameter.  Kinetic  rate  equations  developed  from 
the  model  successfully  account  for  the  observed  power  law  dependence  of  rate  on  oxygen  partial 
pressure.  These  relationships  were  used  in  the  derivation  of  an  expression  for  the  variation  of  oxide 
film  growth  rate  with  overlying  oxide  thickness.  The  relationship!  is  tested  against  experimental 
observations  reported  in  the  literature  and  found  to  give  an  excellent  fit.  ©  7995  American  Vacuum 
Society. 


I.  INTRODUCTION 

The  oxide  of  silicon  is  a  uniquely  critical  material  within 
silicon  semiconductor  technology,  in  that  it  is  a  major  com¬ 
ponent  for  metal-oxide-semiconductor  (MOS)  devices,  as 
well  as  having  importance  in  other  metal-insulator-semi¬ 
conductor  (MIS)  and  tunneling  structures.1  Until  device  fab¬ 
rication  began  to  delve  into  the  submicron  regime,  the  un¬ 
derstanding  of  oxide  formation  mechanisms  and  the  accom¬ 
panying  parametric  processing  relationships  that  had  been 
developed  in  the  mid-1960s  was  adequate  for  determining 
the  process  and  materials  properties  concerns  of  device  per¬ 
formance.  As  device  geometries  have  shrunk,  however,  a 
number  of  previously  unimportant  or  unobserved  issues  have 
been  found  to  require  resolution  in  order  for  deep  submicron 
and  nanoscale  devices  to  be  constructed.2 

Models  employed  in  discussing  the  thermal  oxidation  of 
silicon  have  usually  been  developed  within  the  framework 
originally  formulated  by  Deal  and  Grove  in  1965.3  They  pro¬ 
posed  that  the  oxidation  rate  was  determined  by  a  combina¬ 
tion  of  two  processes.  The  first  involved  the  actual  chemical 
reaction  of  oxygen  with  silicon  at  the  oxide/substrate  inter¬ 
face,  while  the  second  was  the  diffusion  of  oxygen  through 
the  previously  formed  oxide  film.  The  combination  of  these 
processes  resulted  in  the  formulation  of  the  classical  “linear- 
parabolic”  rate  law  of  Deal  and  Grove: 

dxfdt  =  FIN }  =  (kC*/Nx )/( 1  +  klh  +  fct/Deff),  (1) 

where  x  is  the  oxide  thickness,  F  the  total  flux  of  oxidant 
molecules  through  the  oxide,  k  the  first-order  rate  constant 
for  the  oxidation,  C*  the  concentration  of  oxidant  at  the 
oxide  surface,  A,  the  number  of  oxidant  molecules  incorpo¬ 
rated  into  a  unit  volume  of  the  oxide  layer,  h  the  gas  phase 
transport  coefficient  of  oxygen,  and  Deff  the  effective  diffu¬ 
sion  coefficient  of  oxygen  in  silicon  dioxide.  When  the  dif¬ 
ferential  equation  is  solved,  Eq.  (1)  may  be  rewritten  as 

x2+Ax  =  B{t+r),  (2) 


where 


A  =  2Deff(l/k+l/h), 

(3) 

B=2DeffC*/N ! , 

rl 

(4) 

a 

r=  ( xl+Ax0)/B . 

(5) 

The  Deal-Grove  model  requires  the  parameter  r  (a  shift  in 
the  time  coordinate)  in  order  to  account  for  the  presence  of 
an  initial  oxide  layer  on  the  silicon  surface. 

The  above  approach  was  found  to  provide  an  excellent 
fit  to  experimental  data  for  oxidation  processes  utilizing 
H20/02  mixtures  and  for  dry  oxidation  processes  at  thick¬ 
nesses  in  excess  of  ca.  40  nm.4  Deal-Grove  estimates  fail, 
however,  in  predicting  oxidation  rates  in  dry  oxygen  within 
the  thin  regime  (<20  nm)  and  in  accounting  for  the  observed 
pressure  dependence  of  both  thin  and  thick  oxidations. 
Within  the  thin  regime,  observed  oxidation  rates  in  dry  pro¬ 
cesses  are  significantly  greater  than  Deal-Grove  predictions 
and  the  reaction  exhibits  a  power  law  dependence  on 
the  oxygen  pressure  with  rate  proportional  to  Pm  where 
m  =  0.6-0.8.5’6 

A  variety  of  models  have  been  proposed  which  address 
the  failures  of  the  Deal-Grove  model  in  the  thin  oxide  limit. 
Space-charge  effects  have  been  postulated  to  give  enhanced 
rates  in  thin  oxidations.7  The  phenomenon  requires  the  oxi¬ 
dant  to  exist  and  diffuse  in  ionic  form  and  is  operative  only 
over  oxide  thicknesses  comparable  to  the  extent  of  the  space- 
charge  region.  Based  on  a  detailed  analysis  of  experimental 
parameters,  oxidation  results,  and  the  effects  of  the  electric 
field  at  the  interface  on  oxygen  transport,  it  generally  has 
been  concluded  that  space-charge  effects  are  not  the  mecha¬ 
nism  giving  rise  to  enhanced  oxidation  rates  within  the  thin 
oxidation  regime.8  Other  attempts  at  explaining  elevated 
rates  in  thin  oxidations  proposed  structural  differences  be¬ 
tween  thin  and  thick  oxides,  which  lead  to  enhanced  oxygen 
diffusion  in  the  thin  region.9  The  presence  of  microchannels 
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with  diameters  of  about  5  nm  has  been  proposed  to  increase 
oxygen  transport  in  thin  oxides.10  Some  experimental  evi¬ 
dence  for  such  microchannels  exists.11  Stress  at  the  oxide/ 
substrate  interface  is  also  thought  to  affect  the  rate  of 
oxidation.10  Stress  has  been  invoked  to  vary  either  the  trans¬ 
port  of  oxygen  through  the  oxide  film  or  the  intrinsic  reac¬ 
tion  rate  at  the  interface.12- 14  Finally,  variations  in  the  oxi¬ 
dizing  species  have  been  suggested  as  potential  sources  of 
variation  in  oxidation  rates  in  the  thin  and  thick  regimes.15 
Models  which  invoke  the  presence  of  atomic  oxygen  as  the 
primary  diffusing  species  cannot  be  ruled  out  on  the  basis  of 
current  evidence. 

An  examination  of  the  literature  and  available  experimen¬ 
tal  evidence  does  not  permit  a  definitive  choice  of  model  for 
the  oxidation  of  silicon.  Here,  we  begin  an  examination  of 
silicon  oxidation  with  a  model  based  on  the  dissociative 
chemisorption  of  molecular  oxygen  at  the  oxide/silicon  inter¬ 
face.  A  fundamental  rate  equation  is  developed  from  this 
model  and  this  determines  the  inherent  rate  of  oxidation  of 
silicon  under  a  given  set  of  process  conditions.  This  equa¬ 
tion,  with  appropriate  modifications  for  the  diffusion- 
controlled  availability  of  oxygen  at  the  interface,  is  capable 
of  predicting  oxidation  rates  in  both  thin  and  thick  oxide 
regimes. 


II.  PROPOSED  SILICON  OXIDATION  MECHANISM 

Our  conceptual  model  is  a  process  in  which  the  oxidation 
of  silicon  involves  dissociative  chemisorption  of  molecular 
oxygen.  Figure  1(a)  shows  an  idealized  schematic  of  the  sili¬ 
con  (100)  surface  with  the  classic  (2X1)  reconstruction. 
Sphere  sizes  in  the  schematic  diagram  reflect  the  relative 
covalent  radii  of  silicon  and  oxygen.  A  number  of  different 
adsorption  sites  on  the  reconstructed  surface  are  apparent  in 
the  diagram.  The  nature  of  the  individual  sites  and  the  cal¬ 
culated  energetics  for  oxygen  adsorption  have  been  reported 
in  the  literature.16,17  It  is  instructive  to  approach  the  chemi¬ 
sorption  of  the  oxygen  molecule  on  this  surface  from  the 
point  of  view  of  the  molecular  orbitals  of  the  oxygen  mol¬ 
ecule.  The  molecular  orbital  energy  diagram  for  ground  state 
oxygen  is  shown  in  Fig.  2(a).  The  highest-energy,  occupied 
molecular  orbitals  consist  of  two  half-filled  orthogonal  and 
degenerate  orbitals  of  antibonding  symmetry.  959r  probabil¬ 
ity  surfaces  for  the  orbitals  are  shown  in  Fig.  2(bh  Interac¬ 
tions  between  the  oxygen  molecule  and  the  silicon  surface 
are  initiated  by  the  mixing  of  the  orbitals  with  the  highest- 
energy  orbitals  of  the  surface.  The  highest-energy  orbital  on 
the  silicon  (2X1)  surface  is  postulated  to  be  a  half-filled 
“dangling  bond”  on  each  silicon  atom  in  the  dimer  units. 
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Fig.  2.  Molecular  orbital  (a)  energy  diagram  for  the  dioxygen  molecule;  (b) 
95%  probability  surfaces  for  molecular  orbitals. 


presumably  of  sp 3  symmetry.  Simple  considerations  of  the 
relative  geometries  of  the  surface  “dangling  bond”  (expected 
to  project  from  the  dimer  into  the  gap  between  dimer  rows) 
and  the  orthogonal,  half- filled  7 r*  orbitals  on  the  oxygen 
molecule  suggest  that  maximum  mixing  will  occur  when  ad¬ 
sorption  takes  place  with  the  geometry  depicted  in  the  initial 
oxygen  chemisorption  step  shown  in  Fig.  1(a).  Since  the  in¬ 
teracting  orbitals  on  the  oxygen  molecule  are  77*  in  charac¬ 
ter,  incorporation  of  electron  density  from  the  silicon  acts  to 
weaken  the  oxygen- oxygen  bond  and  leads  to  dissociation 
of  the  oxygen  molecule  and  insertion  of  the  oxygen  atoms 
into  nearby  silicon-silicon  back  bonds.  Whether  the  chemi¬ 
sorption,  bond  dissociation,  and  bond  insertion  occur  as  a 
concerted  reaction  or  in  discrete  steps  cannot  be  distin¬ 
guished  without  extensive  additional  experimental  character¬ 
izations. 

Experimental  evidence  from  surface  analytical  studies 
conducted  in  our  laboratories  and  other  facilities18  suggests 
that,  under  anhydrous  conditions,  three  different  initial  reac¬ 
tions  may  occur,  depending  upon  the  temperature  employed 
in  the  oxidation  process.  Figure  1(a)  shows  a  possible  reac¬ 
tion  scheme  for  the  dissociation  of  molecular  oxygen  on  the 
silicon  surface  at  room  temperature.  At  low  temperature, 
even  though  Si-0  bond  formation  may  be  thermodynami¬ 
cally  favored,  there  appears  to  be  insufficient  energy  for  dis¬ 
sociation  of  the  silicon  dimer  bonds,  and  the  reaction  may  be 
kinetically  limited  to  the  formation  of  oxygen  bridges  be¬ 
tween  dimers,  as  shown,  with  oxygen  coverage  of  ca.  0.5 
monolayer  (Table  I).  Experimental  evidence  supporting  the 
presence  of  such  a  structure  has  been  reported.18  As  the  tem¬ 
perature  of  the  process  is  raised,  sufficient  energy  for  the 
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Table  I.  Auger  spectral  analyses  for  HF-treated  and  sputter-cleaned  Si(100) 
surfaces. 


Ratio  of 
Sisio  ^Sisio2 

Oxide  coverage 
ML  (A) 

Reacted  oxygen 
(%) 

HF  treated 

60  min  dry  oxygen 

0.07 

0.14,  0.32 

49.1 

60  min  ambient  air 

0.19 

1.02,  2.36 

84.7 

Sputter  cleaned 

60  min  dry  oxygen 

0.33 

1.16,  2.69 

90.7 

60  min  ambient  air 

0.30 

1.77.  4.10 

105.7 

dissociation  of  the  surface  dimers  may  become  available  and 
(1 X 1)  oxygen  surface  superlattices  can  evolve  from  more 
extensive  oxygen  coverage  (1  monolayer).  Again,  limited  ex¬ 
perimental  evidence  for  the  formation  of  such  structures  is 
available  in  the  literature.19  The  exact  temperature  at  which 
such  a  structure  can  form  has  not  been  determined,  but  a 
cursory  examination  of  relative  bond  strengths  permits  an 
estimate  of  the  temperature  range  for  the  formation  of  the 
fully  oxidized  surface.  Si-H  bonds  are  known  to  dissociate 
at  temperatures  of  450  °C  and  these  have  a  dissociation  en¬ 
ergy  of  80-90  kCal/mol.  Si-Si  bond  dissociation  energies 
for  the  surface  dimers  have  not  been  determined,  but  it  is 
instructive  to  compare  them  with  the  Si-Si  dissociation  en¬ 
ergy  in  disilane,  74  kCal/mol.  It  is  reasonable  to  expect  that 
this  energy  represents  an  upper  limit  on  the  dissociation  en¬ 
ergy  of  the  surface  dimers.  Consequently,  silicon  surface 
dimers  may  reasonably  be  expected  to  dissociate  at  tempera¬ 
tures  around  300-400  °C.  Experimental  evidence  on  the  for¬ 
mation  of  full  oxygen  monolayers  in  some  intermediate  tem¬ 
perature  oxidations  would  appear  to  support  this  projection.20 

Atmospheric  pressure  oxidations  carried  out  at  higher 
temperatures  (but  below  temperatures  at  which  sustained 
oxidation  occurs)  appear  to  self-limit  at  thicknesses  of  some¬ 
what  less  than  1  nm.  Typical  values  of  0.6-0.7  nm  have  been 
reported.  A  possible  rationale  for  these  observations  is  shown 
in  Fig.  1(b),  where  a  mechanism  which  utilizes  the  iterative 
action  of  molecular  oxygen  chemisorption  at  the  oxide/ 
silicon  interface  followed  by  02  bond  dissociation  and  inser¬ 
tion  into  silicon  back  bonds  produces  a  layer-by-layer  growth 
of  the  oxide  film.  The  dependence  of  the  mechanism  on  the 
chemisorption  of  molecular  oxygen  via  orbital  mixing  with 
the  unoxidized  silicon  at  the  interface  necessarily  limits  the 
extent  to  which  the  reaction  can  proceed  without  some  addi¬ 
tional  driving  force.  It  may  be  seen  from  Fig.  1(b)  that  un¬ 
oxidized  silicon  becomes  unavailable  to  surface  adsorbed 
oxygen  after  the  formation  of  two  fully  oxidized  layers  of 
silicon,  and  the  reaction  will  therefore  cease  at  this  point  for 
any  temperature  below  those  at  which  appreciable  oxygen 
diffusion  to  the  interface  can  occur.  The  calculated  thickness 
of  this  self-limited  layer  of  oxide  films  is  0.5-0. 6  nm,  con¬ 
sistent.  with  the  experimentally  observed  thickness  in  self¬ 
limited  oxide  films. 

This  model  thus  appears  to  adequately  describe  the  struc¬ 
tural  phenomena  that  have  been  reported  for  dry  oxidations 
of  clean  silicon  surfaces.  It  must  be  noted  that  the  model 
does  not  address  oxidations  in  which  moisture  is  present,  nor 
does  it  address  oxidations  and  native  oxide  thicknesses  on 
surfaces  that  have  been  exposed  to  moisture  at  any  time  prior 
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Fig.  3.  A  possible  mechanism  for  the  origin  of  inherent  interfacial  roughness 
between  the  silicon  dioxide  film  and  the  substrate  (see  the  text). 


to  dry  oxidation.  In  such  cases,  distinctly  different  mecha¬ 
nisms  of  oxidation  can  be  expected  due  to  the  presence  of 
water. 

The  model  may  also  provide  a  mechanistic  rationale  for 
the  observation  that  oxide/silicon  interfaces  appear  to  exhibit 
an  inherent  interfacial  roughness  of  about  one  atomic  diam¬ 
eter.  Consider  Fig.  3,  in  which  the  (Oil)  projection  of  the 
silicon  (100)  surface  is  shown  during  successive  stages  of 
oxidation.  At  the  top  of  the  figure  a  mixed  configuration  for 
the  initial  oxidation  layer  is  depicted  in  which  both  the 
(1X1)  and  (2X1)  oxygen  bridged  structures  are  present  on 
the  surface.  Such  a  configuration  might  arise  when  oxidation 
of  the  silicon  is  permitted  during  transient  temperature  ramp¬ 
ing.  In  such  an  arrangement,  areas  of  the  surface  that  have 
retained  the  (2X1)  structure  effectively  cause  a  temporary 
barrier  to  oxygen  insertion  in  the  underlying  back  bonds, 
while  oxidation  may  proceed  in  the  areas  with  the  more  open 
(1X1)  oxygen/silicon  arrangement.  Oxidation  of  the  top 
layer  back  bonds  gives  a  juncture  of  the  two  types  of  surface 
structure  as  shown  in  the  second  and  third  schematics  of  the 
figure.  As  the  oxidation  proceeds,  and  the  dimer  is  eventually 
oxidized,  the  discrepancy  between  the  oxidation  depth  under 
initially  ( 1 X 1 )  and  (2X1)  surfaces  may  be  expected  to 
propagate  with  the  interface,  giving  rise  to  an  inherent  inter- 
facial  roughness  of  0.3  nm,  approximately  the  diameter  of  an 
SiO  grouping.  Oxidation  to  the  final  condition  exhibiting 
such  interfacial  roughening  is  depicted  in  the  lower  three 
schematics  of  Fig.  3. 
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HI-  SILICON  OXIDATION  KINETICS 

Utilizing  the  conceptual  model  developed  in  the  previous 
discussion,  rate  equations  for  the  oxidation  of  silicon  by  dry 
oxygen  can  be  developed.  The  model  suggests  that  the  reac¬ 
tion  occurs  via  classical  Langmuir  kinetics,  with  the  produc¬ 
tion  of  an  intermediate  species  prior  to  the  final  reaction  to 
product  Si02.  We  currently  formulate  this  intermediate  as 
“SiO,”  although  it  is  acknowledged  that  a  completely  rigor¬ 
ous  treatment  would  require  a  more  extensive  analysis  of  the 
intermediate  species  and  a  subsequently  more  detailed  series 
of  equations  to  describe  the  chemisorption  process.  Such  rig¬ 
orous  treatments  will  be  pursued  in  future  work.  Following 
the  Langmuir  formalism,  the  reaction  sequence  for  silicon 
oxidation  may  be  simply  written: 

Si+|02^  SiO, 
k- 1 

k2 

SiO+ j02— ►  Si02> 

where  the  first  reaction  represents  the  equilibrium  between 
molecular  oxygen  chemisorbed  at  the  substrate/oxide  inter¬ 
face  and  the  substrate  silicon  and  the  second  is  an  irrevers¬ 
ible  reaction  of  the  intermediate  with  additional  chemisorbed 
oxygen  to  yield  the  product  Si02.  A,,  A_,,  and  k2  are  the 
usual  rate  constants.  From  the  second  of  these  equations,  the 
rate  of  formation  of  the  product,  <f[Si02]/Jr  may  be  ex¬ 
pressed  as 

^[SiO2]/<if  =  A2[SiO][02]l/2,  (6) 

where  [SiO]  is  the  concentration  of  the  partially  oxidized 
intermediate  at  the  interface  and  [02]  the  concentration  of 
chemisorbed  molecular  oxygen  available  at  the  interface. 
The  concentration  of  oxygen  can  be  determined  using  diffu- 
sional  considerations  similar  to  those  employed  in  the  Deal- 
Grove  model  and  most  other  current  reaction  models. 
Steady-state  assumptions  are  not  valid  in  this  reaction  since 
the  availability  of  oxygen  in  the  equilibrium  reaction  will 
vary  within  the  reaction  time  frame  due  to  diffusion  limita¬ 
tions.  The  usual  equilibrium  relationships  therefore  will  not 
be  considered.  However,  absolute  expressions  for  reaction 
rates  remain  valid  and  the  concentration  of  the  intermediate 
species  may  be  determined  by  examining  its  rate  of  change: 

4Si0]/Jr  =  fc,[Si][02]1/2-A-_,[Si0]-fc,[Si0][0,]'/2. 

(7) 

We  know  that  the  concentration  of  the  SiO  intermediate  at 
the  interface  is  a  constant  after  the  formation  of  a  thick  layer 
of  oxidized  silicon:  therefore  J[SiO]/rfr  =  0  and  so.  in  this 
limit. 


[SiO]  = 


^[Si][02]l/: 
A_,  +  A2[  02]l,:- 


(8) 


Substituting  this  expression  into  Eq.  (1).  the  rate  of  forma¬ 
tion  of  the  silicon  dioxide  film  can  be  expressed  as 


</[Si02]/rfr= 


*t*2[Si][0:] 


A_,  +  A2[02]1/:‘ 
and  the  growth  rate  of  the  film  may  be  written  as 


(9) 
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Ndx/dt  = 


kMSilOt] 

*-,+*,[  02]l/: 


where  N  is  a  conversion  factor  for  film  thickness  to  [SiO;] 
analogous  to  that  employed  in  the  Deal-Grove  model.  Since 
[Si]  is  a  constant  in  the  system,  it  may  be  seen  that  the  film 
growth  rate  is  controlled  solely  by  the  oxygen  concentration 
at  the  oxide/substrate  interface.  Equation  (10)  results  from 
the  fundamental  reaction-rate  equation  (9)  and  is  applicable 
for  the  oxidation  of  silicon  by  dry  oxygen  in  the  thick  limit. 
It  is  precisely  this  function  that  must  be  modified  for  thin 
oxides  in  the  transient  regime. 

To  allow  for  the  non-steady-state  behavior  of  the  oxygen 
concentration,  we  return  to  Eq.  (7)  and  multiply  both  sides  of 
the  equation  by  the  integrating  factor  e7'  and,  letting 
k-\  +  k1[01}m=y,  we  can  express  the  derivative  of  the  in¬ 
termediate  concentration  with  respect  to  time  as 

rf/df  {[  S  iO]  e  y'}  ~  {^ !  [  Si][02  ] 1  /2}  e  r'.  (11) 


Integration  gives 


[SiO]  = 


*.[Si][02]1/: 


'  (1  -  e~rr). 


Substituting  into  Eq.  (1),  the  rate  of  formation  for  silicon 
dioxide  can  then  be  expressed  as: 


d[Si02]/dt  =  Ndx/df- 


£|k2[Si][Q2 

c_i  +  £2[02] 


(13) 


We  note  that  this  latter  equation  predicts  no  [SiO]  for  thin 
oxides  (small  t).  We  return  to  this  point  below.  Setting  yt 
—xla,  which  equates  the  total  amount  of  intermediate  re¬ 
acted  to  produce  the  oxide  film  with  the  film  thickness  di¬ 
vided  by  some  unit  length  a,  the  relationship  for  film 
thickness  versus  time  becomes 


Ndx!dt  = 


/:1^2[Si][02] 

*-l+*2[02]1/; 


hDgC*  DeffC* 

k-$D  hD  tff+ xk^h  Deff+k2x'  ^ 

since  h>k2.  Substituting  for  C,  from  Eq.  (17)  into  Eq.  (16) 


xhC* 

DtS+xh 


fleff  PcffC* 
Def(+xh  Deff+k^ 


Values  for  k2  and  Deff  under  typical  thermal  oxidation  con¬ 
ditions  have  been  previously  determined.  At  1000  °C  and 
760^ Torr  of  oxygen,  *3  =  3.6X104  fim/h  and  Deff=  2531 
jUtrr/h8.  Using  these  values,  the  following  approximations 
may  be  made  for  oxidation  under  similar  conditions: 

DcffC* 

Rewriting  Eq.  (13)  using  these  forms  for  the  concentrations 
of  oxygen  and  reformulating  the  constants  yield 

Ndxldt  =  k3  Ci  +  aC0e  ~x/a ,  (20) 

and  substituting  for  C,  and  C0  gives 

,,  W'ffC*  ,  _  .  . 


Ndx!dt  = 


D  +  kyX 


•  +  aC*e' 


J  *3 aC*e~x,a 

dx/dt——~ — - — -  H - .  (oo) 

N(D  +  k3x)  N 

This  expression  for  dxfdt  may  be  compared  with  existing 
data  on  the  oxidation  rate  of  silicon  for  evaluation  of  the 
model.  In  the  thin  film  limit  (<20  nm),  it  should  accurately 
predict  the  observed  enhancement  in  oxide  growth  rates  as 
compared  with  the  rates  at  greater  oxide  thicknesses.  Beyond 
oxide  thicknesses  of  20  nm,  the  relationship  must  give  rates 
that  are  comparable  with  those  predicted  by  the  Deal-Grove 
model  and  with  experimental  observations.  The  parameter  a 
in  the  exponential  term  should  be  a  characteristic  length 
which  determines  the  onset  of  diffusional  effects  in  the  rela¬ 
tionship. 


Relation  (14)  describes  the  growth  of  the  oxide  in  terms 
of  the  concentration  of  molecular  oxygen  at  the  oxide/ 
substrate  interface.  Since  this  concentration  cannot  be  experi¬ 
mentally  determined,  we  need  to  express  the  interfacial  con¬ 
centration  of  oxygen  in  terms  of  the  oxygen  concentration  at 
the  oxide  surface,  a  known  quantity  that  may  be  calculated 
from  the  oxygen  partial  pressure  in  the  gas  phase  above  the 
substrate.  To  obtain  this  relationship  we  make  use  of  Fick’s 
laws,  and  the  continuity  of  flux,  in  a  derivation  analogous  to 
that  of  the  Deal -Grove  model.  The  three  fluxes  are  ex¬ 
pressed,  in  steady  state,  as 

Fi~h(C*-C0),  F  2  =  (Deff/x)(C0  —  C,), 

f3  =  k3Ci.  (15) 

Continuity  of  flux  requires  F {  =  =  F3 ,  and  we  can  solve 

for  Cq  and  for  C, ,  where  C0 ,  Ct ,  and  the  other  variables  are 
the  usual  variables  in  the  Deal— Grove  derivation: 

r  _  ,  OcffCj 

0  DeS+xh  +  Dtn+xh’  (16) 


IV.  COMPARISON:  MODEL  VERSUS  EXPERIMENT 

The  result  of  Eq.  (22)  was  calculated  and  compared  with 
experimental  data  in  the  literature.  The  most  reliable  data  for 
thin  oxidations,  the  region  of  interest,  appears  to  be  that  of 
Massoud,  Plummer,  and  Irene,8  and  comparison  with  these 
results  was  used  as  a  benchmark  in  evaluating  parameters  in 
the  model.  It  was  found  that  the  growth  rates  predicted  by 
the  model  were  strongly  influenced  by  the  ratio  of  k3/a. 
With  k3f  a<  1 ,  enhanced  growth  rates  were  calculated  for  the 
initial  phase  of  oxidation  with  the  model  reducing  to  typical 
Deal-Grove  values  in  the  thick  limit.  Comparisons  of 
growth  rate  versus  thickness  curves  for  our  model,  the  model 
of  Massoud,  Plummer,  and  Irene,  and  Massoud’s  experimen¬ 
tal  data  are  presented  in  Fig.  4  for  oxidation  rates  in  the  thin 
oxide  regime.  Our  analytical  model  may  be  seen  to  predict 
enhanced  growth  rates  in  the  early  oxidation,  with  the  inter¬ 
ception  point  of  various  forms  of  the  model  determined  by 
the  value  of  a  in  the  exponential  term.  Based  on  our  deriva¬ 
tion,  a  should  represent  some  characteristic  length  within  the 
analysis.  Various  tests  for  values  of  a  were  conducted  in 
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Fig.  4.  Growth  rate  vs  thickness  relationships  for  silicon  (100)  dry  oxidations  at  1000  °C. 


order  to  determine  the  best  fit  to  the  experimental  data.  It 
may  be  seen  from  Fig.  4  that  an  almost  ideal  fit  of  model  to 
data  is  achieved  with  a  value  of  a  =  4.24  nm.  We  believe 
that'  this  value  denotes  the  oxide  thickness  at  which  diffu- 
sional  limitations  on  the  oxygen  concentration  at  the  inter¬ 
face  begin  to  control  the  reaction. 

V.  AUGER  MICROPROBE  ANALYSES 

The  initial  oxidation  of  a  (2X1)  silicon  (100)  surface  at 
room  temperature  is  studied  with  Auger  electron  spectros¬ 
copy.  Cleaned  surfaces  are  passivated  in  dilute  HF  and 
heated  to  600  °C  instead  of  the  usual  high-temperature 
sputter-clean/anneal  cycles  employed  in  many  previous 
studies.21-25  Water  vapor  in  the  oxygen  gas  at  1  atm  pressure 
used  for  exposures  is  below  0.5  ppm.  The  effects  of  surface 
roughening  and/or  the  introduction  of  water  vapor  on  these 
surfaces  is  measured  by  comparison  with  sputter-cleaned  sur¬ 
faces  and  ambient  air  exposures.  Hydrogen-desorbed  silicon 
surfaces  have  less  than  0.032  nm  coverage  after  exposure  to 
the  ultradry  oxygen  at  room  temperature.  The  presence  of 
water  vapor  allowed  this  to  increase  to  0.236  nm.  It  is  pos¬ 
sible  to  account  for  the  presence  of  excess  oxygen  on  the 
surface  by  using  a  calculation  technique  based  upon  decom¬ 
position  of  the  silicon  LVV  Auger  spectra. 

Clean  silicon  (100)  samples,  p  type  of  5-10  O  cm,  were 
prepared  by  first  using  a  10: 1  HF  etch,  with  organic  residuals 
removed  in  a  70:30  sulfuric  acid:hydrogen  peroxide  solution 
followed  by  a  10  min  dip  in  1%  HF  in  order  to  hydrogen- 
passivate  the  surface.26,27  Samples  are  immediately  loaded 
into  an  ultrahigh  vacuum  chamber  of  a  PHI  600  Scanning 
Auger  Multiprobe  and  heated  to  600  °C  for  30  min  for  hy¬ 
drogen  desorption  and  surface  reconstruction.21,28,29  Cooled 


samples  are  exposed,  in  one  case,  to  1  atm  ultrapure  oxygen 
(99.9999%,  <0.5  ppm  water  vapor)  in  the  load  lock  of  the 
Auger  system  at  room  temperature,  then  reinserted  into  the 
vacuum  chamber  through  the  load  lock  without  any  exposure 
to  ambient  air.  In  order  to  verify  that  surfaces  are  reactive, 
samples  are,  alternatively,  exposed  to  moist  ambient  air.  A 
part  of  the  sample  is  sputter  cleaned  in  order  to  compare  the 
reactivity  of  smooth,  hydrogen-desorbed  surfaces  to  sputter- 
cleaned,  roughened  surfaces  under  the  same  conditions.  In 
all  cases,  cleaned  surfaces  have  no  detectable  oxygen  or  fluo¬ 
rine  and  negligible  carbon  before  heating.  After  heating, 
there  is  some  oxygen  on  the  surface  probably  due  to  chamber 
or  sample  stage  outgassing  of  water  vapor,  but  it  is  below 
2%.  The  vacuum  remained  at  10~10  Torr  except  for  brief 
transients  when  power  is  applied'  maximally  to  the  heating 
filament. 

Figure  5  shows  the  measured  silicon  LVV  Auger  spectra 
for  the  HF-treated  samples,  while  Fig.  6  depicts  the  sputter- 
cleaned  samples  under  comparable  conditions.  The  hydro¬ 
gen-passivated  line  shape  differs  from  the  sputter-cleaned 
surface  noticeably  in  that  the  former  has  a  prominent  peak 
(denoted  a)  at  about  15.5  eV  below  the  main  peak  at  89  eV. 
This  is  presumably  the  bulk  plasmon  loss  peak  which  is  also 
larger  in  the  backscattered  primary  spectrum.  This  peak  is 
reduced  by  heating  and  subsequent  cooling  or  by  heavy  elec¬ 
tron  beam  bombardment  as  would  be  expected  for  hydrogen 
desorption  and  a  change  in  surface  structure  from  ( 1 X  1 )  to 
(2X1).  The  Auger  spectrum  for  the  HF-treated  sample  ex¬ 
posed  to  ultradry  oxygen  for  60  min  shows  virtually  no  sign 
of  oxidation,  while  the  sputtered  surface  shows  the  predict¬ 
able  signs  of  early  stage  oxidation:  a  peak  at  82  eV  (denoted 
b)  usually  assigned  to  SiO*  and  one  at  76  eV  (denoted  c)  for 
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Fig.  5.  Si  LVV  Auger  spectra  for  HF-treated  silicon  (100)  samples  subjected 
to  various  oxidation  process. 

Si02.3  31  More  activity  is  displayed  in  the  spectrum  of 
smooth  hydrogen  desorbed  surfaces  exposed  to  moist  air. 

Table  I  lists  the  results  of  the  line  shape  analysis  applied 
to  quantify  the  level  of  oxide  growth.  It  is  found  that,  during 
various  investigations  of  the  desorbed  (relatively  inert)  sur¬ 
face,  oxygen  can  accumulate  on  the  surface  without  affecting 
the  silicon  spectrum  especially  when  exposure  is  interrupted 
for  growth  rate  measurements.  Therefore,  exposures  shown 
are  not  interrupted  and  oxide  coverage  is  determined  by  the 
degree  of  oxidation  shown  in  the  silicon  line  shape  itself. 

Oxide  coverage  is  determined  by  decomposition  of  the 
normalized  background  subtracted  silicon  spectrum  into 
three  parts:  Sisi,  Sisi0i,  and  Sisi0,  where  the  sum  of  these 
three  components  is  unity.  The  total  count  is  then  given  by 

total  count— Sisi  -F3SiSi0^ +2SiSi0 , 

assuming  the  stoichiometry  of  SiO^  to  be  given  by  x  =  1 .  The 
fraction  of  silicon  and  silicon  dioxide  in  the  count  is 

^si~ Sisi  /(total  count) ,  ^s\o2  =  3SiSi0^  /(total  count) . 

Attenuation  of  the  pure  silicon  component  by  oxide  is  then 
used  to  determine  oxide  thickness  using  the  logarithmic  de¬ 
cay  due  to  inelastic  scattering  characterized  by  the  escape 
depth  parameter  X  adjusted  for  detection  angle.32’33  Thus 

thickness =XML  ln(FSi). 

A  value  of  XML=3.13  ML=0.726  nm,  where  ML  denotes 
monolayer,  is  used  in  the  calculations  based  upon  a  quartz- 
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Fig.  6.  Si  LVV  Auger  spectra  for  sputter-cleaned  silicon  (100)  surfaces 
subjected  to  varied  oxidation  processes. 


like  density  and  a  30°  detector  angle.  The  predicted  oxygen 
concentration  is  given  by  FSi02.  The  reacted  oxygen  is  sim¬ 
ply  the  predicted  oxygen  concentration  divided  by  the  mea¬ 
sured  concentration.  Predicted  oxygen  might  be  higher  than 
actual  because  measured  oxygen  Auger  electrons  are  less  at¬ 
tenuated  than  the  Sisi0  electrons.  This  technique  is  very  sen¬ 
sitive  in  the  0-3  nm  thickness  regime,  where  small  changes 
in  coverage  cause  obvious  attenuation  of  the  underlying  spe¬ 
cies. 

The  lack  of  growth  for  hydrogen-desorbed  smooth 
Si(100)  surfaces  supports  the  assertions  of  previous 
work.24,34,35  Although  this  technique  is  limited  in  absolute 
accuracy,  there  are  a  few  clear  interpretations:  (a)  clean, 
smooth  Si(  100)  is  unreactive  in  absolutely  dry  oxygen  at 
room  temperature  even  after  heat  treatments  have  supposedly 
removed  the  hydrogen  passivation,  and  (b)  growth  is  slight 
even  with  the  help  of  water  vapor.  Finally,  for  the  thinnest 
oxides  (HF  treated,  dry  02),  there  is  little  evidence  for  [SiO], 
as  predicted  by  Eqs.  (13)  and  (14).  However,  SiO  formation 
is  a  fundamental  process  of  our  model,  and  these  results 
suggest  a  strong  tendency  for  island  growth  in  the  initial 
phases. 

The  use  of  a  constant  escape  depth  when  the  structure  of 
the  initial  oxide  is  not  completely  understood  is  less  than 
ideal.  Work  is  continuing  in  this  area,  in  the  interpretation  of 
Auger  peaks,  and  in  the  improvement  of  background  subtrac¬ 
tion. 
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VI.  CONCLUSIONS 

We  have  analyzed  the  dry  oxidation  of  silicon  utilizing  a 
model  that  invokes  dissociative  chemisorption  in  silicon  at 
the  interface  between  the  silicon  dioxide  film  and  the  sub¬ 
strate.  A  model  is  presented  which  supports  the  diffusing 
species  in  such  processes  to  be  molecular  rather  than  atomic 
oxygen.  The  interfacial  chemisorption  framework  as  a  model 
for  silicon  oxidation  has  been  shown  to  account  for  current 
observations  on  the  initial  stages  of  silicon  oxidation  in  ul¬ 
tradry  oxygen,  in  that  the  model  readily  accommodates  sub- 
monolayer  reaction  schemes  in  which  the  silicon  dimer  is 
retained  on  the  surface.  The  model  suggests  a  temperature 
dependence  in  the  very  early  stages  of  the  reaction  that  ac¬ 
counts  for  conflicting  observations  on  the  relative  degree  of 
silicon  oxidation  on  exposure  to  oxygen  at  room  temperature 
(RT)  and  at  temperatures  intermediate  between  RT  and  that 
required  for  sustained  oxidation.  The  model  predicts  a  self¬ 
limit  of  0.5-0.6  nm  in  oxidations  performed  at  temperatures 
sufficient  to  dissociate  surface  dimers  and  to  permit  oxygen 
penetration  of  the  substrate  beyond  the  first  monolayer  of 
suboxide.  Bond  energy  considerations  suggest  that  this  tem¬ 
perature  regime  is  probably  in  the  400-600  °C  interval  for 
dry  oxidations.  Detailed  examination  of  the  model  has  also 
suggested  a  mechanism  which  can  account  for  the  observa¬ 
tion  that  oxide/silicon  interfaces  exhibit  an  inherent  interfa¬ 
cial  roughness  of  approximately  one  atomic  diameter  or 
about  0.3  nm. 

Kinetic  rate  equations  based  on  the  model  have  been  de¬ 
veloped  which  successfully  account  for  the  observed  power 
law  dependence  of  rate  on  oxygen  partial  pressure.  These 
relationships  have  been  used  in  the  derivation  of  an  expres¬ 
sion  for  the  variation  of  oxide  film  growth  rate  with  overly¬ 
ing  oxide  thickness.  The  relationship  has  been  tested  against 
experimental  observations  reported  in  the  literature  and 
found  to  give  an  excellent  fit. 
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Introduction 


Silicon  dioxide  is  the  most  important  component  in  the  fabrication  of  a  MOS  device.  In 
spite  of  the  tremendous  research  in  the  last  thirty  years,  it  has  not  been  possible  to  accurately  model 
the  initial  oxidation  phase  of  silicon  [1-12],  Thermally  grown  silicon  dioxide  in  VLSI  processing 
application  ranges  in  thickness  from  6  nm  -1000  nm.  Some  of  the  major  functions  of  these  films 
include  a)  masking  against  ion  implantation  and  diffusion;  b)  passivation  of  silicon  surface;  c) 
isolation  of  individual  devices;  d)  use  as  a  gate  oxide  and  capacitor  dielectric  in  MOS  devices;  and 
e)  use  as  a  tunneling  oxide  in  ROMs  etc.  [1] 

The  understanding  of  oxide  formation  mechanisms  that  was  developed  in  the  mid- 1960’s 
was  inadequate  for  determining  the  process  and  material  properties  important  to  device 
performance  as  the  device  fabrication  began  to  scale  down  into  submicron  regime.  As  device 
geometries  have  shrunk,  a  number  of  issues  previously  considered  unimportant  or  unobserved, 
have  been  found  to  require  a  better  resolution  in  order  for  deep  sub-micron  and  nanoscale  devices 
to  be  constructed  [2], 

Thermal  oxidation  of  silicon  has  been  usually  modeled  within  the  framework  originally 
formulated  by  Deal  and  Grove  [3].  They  proposed  that  the  oxidation  rate  was  determined  by  a 
combination  of  two  processes.  The  first  involved  the  actual  chemical  reaction  of  oxygen  with 
silicon  at  the  oxide/substrate  interface,  while  the  second  involved  the  diffusion  of  oxygen  through 
the  already  formed  oxide  film.  The  combination  of  these  processes  resulted  in  the  formulation  of 
the  classical  “linear-parabolic”  rate  law  of  Deal-Grove.  However,  the  simple  linear-parabolic  rate 
law  could  not  accurately  explain  the  growth  curves  obtained  for  oxides  of  thicknesses  below  20 
nm.  Massoud  and  Plummer  [4]  proposed  a  modification  to  the  original  Deal-Grove  model  by  the 
addition  of  two  separate  exponential  terms,  although  they  did  not  associate  any  relevant  physical 
processes  that  would  be  necessary  to  justify  the  addition  of  these  two  exponential  terms.  However, 
they  claimed  that  this  fit  the  data  better  in  the  thin  oxide  regime. 


Background 


It  has  been  shown  by  experiments  that  oxidation  proceeds  by  the  inward  movement  of  a 
species  of  oxidant  rather  than  by  the  outward  movement  of  silicon  [2, 3, 4,5].  Based  on  this 
concept,  Deal  and  Grove  proposed  their  model: 
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where  x  is  the  oxide  thickness,  F  the  total  flux  of  oxidant  molecules  through  the  oxide,  k  the  first 
order  rate  constant  for  oxidation,  C*  the  equilibrium  concentration  of  oxidant  in  the  gas  phase,  Nj 
the  number  of  oxidant  molecules  incorporated  into  a  unit  volume  of  the  oxide  layer,  h  the  gas  phase 
mass  transfer  coefficient  of  oxygen  and  Deff  the  effective  diffusion  coefficient  of  oxygen  in  silicon 
dioxide.  Solution  of  the  differential  equation  will  result  in 
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The  Deal-Grove  model  requires  the  parameter  x  (a  shift  in  the  time  co-ordinate  )  in  order  to  account 
for  the  presence  of  an  initial  oxide  layer  on  the  silicon  surface. 


The  above  approach  was  found  to  provide  an  excellent  fit  to  experimental  data  for  oxidation 
processes  utilizing  H2O/O2  mixtures  and  for  dry  oxidation  processes  at  thicknesses  in  excess  of  ca. 
40  nm  [6].  Deal-Grove  estimates  fail,  however,  in  predicting  oxidation  rates  in  dry  oxygen  within 
the  thin  regime(<  20  nm)  and  in  accounting  for  the  observed  pressure  dependence  of  both  thin  and 
thick  oxidations.  Within  the  thin  regime,  observed  oxidation  rates  in  dry  processes  are  significantly 
greater  than  the  Deal-Grove  predictions  and  the  reaction  exhibits  a  power  law  dependence  on  the 
oxygen  pressure  with  rate  propotional  to  P'n,  where  m= O.6-O.8  [7,8,9]. 

Space-charge  effects  have  been  postulated  to  give  enhanced  rates  in  thin  oxidations  [9]. 
This  phenomenon  requires  the  oxidant  to  exist  and  diffuse  in  ionic  form  and  is  operative  only  over 
the  oxide  thicknesses  comparable  to  the  extent  of  the  space-charge  region.  Based  on  detailed 
analysis  of  experimental  parameters,  oxidation  results,  and  the  effects  of  the  electric  field  at  the 
interface  on  oxygen  transport,  it  generally  has  been  concluded  that  space  charge  effects  are  not  the 
mechanism  giving  rise  to  enhanced  oxidation  rates  within  the  thin  oxidation  regime  [4], 

Other  attempts  at  explaining  elevated  rates  in  the  thin  oxidations  proposed  structural 
differences  between  thin  and  thick  oxides,  which  lead  to  enhanced  oxygen  diffusion  in  the  thin 
region  [10].  The  presence  of  microchannels  with  diameters  of  about  5  nm  has  been  proposed  to 
increase  oxygen  transport  in  thin  oxides  [1 1],  Some  experimental  evidence  for  such  microchannels 
exists  [12].  Stress  at  the  oxide/substrate  interface  is  also  thought  to  affect  the  rate  of  oxidation  [11]. 
Stress  has  been  invoked  to  vary  either  the  transport  of  oxygen  through  the  oxide  film  or  the 
intrinsic  reaction  rate  at  the  interface  [13-15].  Finally,  variations  in  the  oxidizing  species  have  been 
suggested  as  the  potential  sources  of  variation  in  oxidation  rates  in  the  thin  and  thick  regimes  [16]. 
Massoud's  modification  of  the  total  growth  rate  expression  is  given  by 
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The  pre- factor  Ci  and  exponent  Li  describes  the  initial  phase  of  the  growth  and  C2  and  L2  describe 
the  intermediate  phase  of  the  growth.  Jorgensen  [17]  suggested  that  the  diffusing  species  in  dry 


oxygen  is  a  singly  ionized  oxygen  molecule,  O2'.  However,  the  experiments  of  Jorgensen  are 
subject  to  objections  of  Modlin’s  [18]  experiment,  which  gave  results  opposite  to  that  of 
Jorgensen.  Furthermore,  Modlin’s  results  indicated  that,  for  diffusion-limited  thermal  oxidation, 
the  diffusing  species  is  neutral  molecular  oxygen. 

Most  of  the  above  discussions  retain  the  Deal-Grove  description.  They  also  retain  the  view 
that  only  the  standard  diffusion  mechanism  occurs  (e.g.  oxygen  molecules  diffuse  interstitially), 
and  then  they  introduce  some  modest  change,  like  a  stress  gradient,  and  hence  have  a  weak 
dependence  of  the  diffusion  and  interface  reaction  parameters  in  the  Deal-Grove  expressions.  The 
Mott-Cabrera  [19]  model  relies  upon  two  key  ideas.  First,  there  is  an  electron  tunneling  process 
which  tranfers  charge  across  the  oxide  until  the  potential  generated  prevents  further  transfers. 
Secondly,  the  electric  field  set  up  in  this  process  affects  the  injection  of  mobile  ions  into  the  oxide; 
there  are  no  effects  on  the  process  of  transport  across  the  oxide  in  this  model. 

It  is  widely  believed  (and  there  is  overwhelming  evidence  [22,  and  references  therein])  that 
most  transport  is  by  neutral  species,  but  there  are  special  cases  (e.g.  in  the  presence  of  an  electron 
beam)  when  charged  species  may  be  important.  Some  workers  strongly  favor  charged  species. 
Some,  even  though  supporting  the  neutral  oxygen  molecule  transport  in  the  thick  regime,  believe 
that  the  moving  oxygen  species  in  the  thin  oxide  growth  regime  is  charged  [21], 

Proposed  Silicon  Oxidation  Mechanism 

The  conceptual  model  proposed  here  involves  dissociative  chemisorption  of  molecular 
oxygen  for  the  process  of  oxidation  of  silicon  [32].  To  illustrate  the  model,  we  start  from  an  ideal 
clean  (2x1)  reconstructed  Si  (100)  surface  [See  Figs.  2  &  3].  Sphere  sizes  reflect  the  relative 
covalent  radii  of  silicon  and  oxygen  atoms.  The  system  always  tries  to  minimize  the  surface  energy 
by  minimizing  the  number  of  dangling  bonds  on  the  surface  or  by  satisfying  as  many  dangling 
bonds  as  possible.  It  is  assumed  that  the  surface  has  minimized  energy  by  reconstruction.  After  a 
(2x1)  reconstruction,  the  number  of  dangling  bonds  is  reduced  to  one  half  the  unreconstructed 


surface  value,  which  still  gives  rise  to  a  number  of  different  adsorption  sites  on  the  reconstructed 
surface,  which  is  apparent  in  the  diagram.  The  nature  of  these  adsorption  sites  and  the  energetic 
studies  for  oxygen  adsorption  have  been  reported  by  Oshiyama  [22]  and  Zheng  [5], 

The  molecular  orbital  energy  diagram  for  ground  state  oxygen  is  shown  in  Fig.  4(a).  The 
electronic  configuration  of  the  oxygen  atom  is  explained  in  this  section.  The  Highest  Occupied 
Molecular  Orbital  (HOMO)  consists  of  two  half-filled  orthogonal  and  degenerate  orbitals  of  anti¬ 
bonding  symmetry.  The  95%  probability  surfaces  for  the  orbitals  are  shown  in  Fig.  4(b).  The 
mixing  of  the  anti-bonding  HOMO  with  the  highest  energy  orbitals  of  the  silicon  surface  initiates 
the  very  first  interactions  between  the  incoming  oxygen  molecule  and  the  silicon  substrate.  The 
HOMO  on  the  (2x1)  recontructed  Si  (100)  surface  is  postulated  to  be  a  half-filled  “dangling  bond” 
on  each  silicon  atom  in  the  dimer  units,  presumably  of  sp 3  hybridized  symmetry.  Relative 
geometries  of  the  surface  “dangling  bond”,  projecting  from  the  dimer  into  the  gap  between  dimer 
rows,  and  the  orthogonal,  half-filled  n*  anti-bonding  HOMO  on  the  oxygen  molecule  suggest  that 
maximum  orbital  mixing  will  occur  when  adsorption  takes  place  in  the  initial  oxygen  chemisorption 
step  illustrated  in  Fig.  2.  The  interacting  anti-bonding  HOMO  on  the  oxygen  molecule  are  tc*  in 
nature.  Therefore,  the  electronic  density  incorporation  from  the  surface  silicon  acts  to  weaken  the 
oxygen-oxygen  bond,  leading  to  a  dissociation  process  of  the  oxygen  molecule  and  a  consecutive 
insertion  of  the  oxygen  atoms  into  neighboring  silicon-silicon  backbonds.  At  this  stage  it  is  not 
clear  whether  these  three  processes  of  chemisorption,  bond  dissociation  and  bond  insertion  occur 
as  a  concerted  process  or  in  discrete  steps. 

Depending  on  the  growth  temperature,  three  different  initial  reactions  can  be  proposed 
under  anhydrous  conditions,  based  on  experimental  evidence  and  surface  analytical  studies  done  in 
our  laboratories  and  other  facilities  [23].  Figure  2  shows  a  possible  reaction  scheme  for  the 
dissociation  of  molecular  oxygen  on  the  silicon  surface  at  room  temperature.  Even  though 
thermodynamically  favored  Si-0  bond  formation  may  occur  at  lower  temperatures,  there  appears  to 
be  insufficient  energy  for  dissociation  of  the  silicon  dimer  bonds  and  the  reaction  may  be 
kinetically  limited  to  the  formation  of  oxygen  bridges  between  dimers  with  oxygen  coverage  of  ca. 


0.5  monolayer  [23].  However,  as  the  temperature  of  the  process  is  increased,  the  availability  of  the 
energy  required  for  the  dissociation  of  the  surface  dimer  bonds  would  increase  thus  enhancing  the 
probability  of  the  surface  to  evolve  from  a  (2x1)  to  (lxl)  oxygen  surface  superlattice  through 

extensive  oxygen  coverage.  This  produces  a  single  monolayer  coverage  on  the  top  surface. 
However,  there  is  not  much  experimental  evidence  supporting  such  structure  formation  [24,25], 

Oxidation  studies  performed  at  atmospheric  pressure  conditions  and  at  suitably  higher 
temperatures,  in  order  to  sustain  oxidation,  apparently  appear  to  self-limit  at  thicknesses  of  less 
than  1  nm,  whereas  typical  reported  self-limiting  thicknesses  range  from  0.6  -  1.5  nm  [4],  Fig.  3 
illustrates  a  possible  explanation  to  this.  Layer-by-layer  growth  of  the  very  initial  regime  could  be 
explained  by  a  mechanism  which  involves  interactive  action  of  molecular  oxygen  chemisorption  at 
the  oxide-silicon  interface  followed  by  molecular  dissociation  of  oxygen  and  insertion  into  silicon 
backbonds.  The  extent  of  this  mechanism  for  the  initial  layer-by-layer  growth  is  crucially 
dependent  on  the  chemisorption  of  molecular  oxygen  through  orbital  mixing  with  the  unoxidized 
silicon  at  the  silicon/oxide  interface.  From  Fig.  3,  it  is  quite  clear  that  the  unavailability  of  the 
unoxidized  silicon  to  the  oxygen  adsorbed  on  the  surface  limits  the  initial  layer-by-layer  growth 
without  an  additional  driving  force  to  drive  the  oxygen  through  the  thin  oxide  film.  This  kind  of 
masking  or  barrier  effect  is  produced  by  just  two  fully  oxidized  layers  of  silicon  and  the  reaction 
will  therefore  cease  at  this  juncture  for  any  temperature  below  those  at  which  appreciable  oxygen 
diffusion  to  the  interface  can  occur.  The  approximate  thickness  of  this  self-limiting  layer  of  oxide 
film  is  0.5  -  0.6  nm,  which  is  consistent  with  experimentally  reported  values  for  self-limiting  oxide 
films  [4,  and  references  therein]. 

This  model  adequately  describes  the  structural  phenomena  reported  for  dry  oxidation  of 
clean  silicon  (100)  surfaces.  However,  it  should  be  noted  that  this  model  does  not  address 
oxidation  processes  in  the  presence  of  moisture  and  it  does  not  address  native  oxide  growth  or 
native  oxide  thicknesses  that  result  from  exposure  to  a  moist  ambient  prior  to  dry  oxidation.  It  is 
expected  that  the  presence  of  moisture  would  significantly  change  the  basic  physical  and  chemical 


mechanisms  at  the  very  fundamental  level  and  hence  cannot  be  compared  with  dry  ambient 
experiments. 

This  model  also  addresses  the  origin  and  evolution  of  the  inherent  interfacial  roughness 
which  is  always  present  at  the  silicon/oxide  interface  and  which  has  been  characterized  to  extend  to 
ca.  one  atomic  diameter.  Fig.  5  shows  a  (Oil)  projection  of  the  silicon  (100)  surface  during  the 
process  ol  oxidation.  At  the  top  end  ot  the  figure,  a  mixed  configuration  of  the  initial  oxide  layer  is 
shown  where  both  the  (2x1)  and  (lxl)  bridged  oxygen  structures  are  shown  on  the  surface.  A 
transient  temperature  ramping  during  the  oxidation  process  may  produce  such  a  mixed 
configuration  on  the  surface.  Such  a  surface  configuration  is  expected  to  be  the  cause  of  the  origin 
ot  the  inherent  interfacial  roughness.  Regions  of  the  surface  that  have  retained  the  (2x1) 
reconstructed  structure  would  effectively  cause  a  temporary  blocking  of  oxygen  penetration  and 
insertion  phenomena  in  the  underlying  silicon  backbonds.  Penetration  and  insertion  of  the  oxygen 
molecule  and  oxidation  may  proceed  in  the  areas  with  the  more  open  (lxl)  reconstructed 
arrangement.  Oxidation  of  the  top  layer  back  bonds  gives  a  combination  of  the  two  types  of  surface 
structures  as  shown  in  the  second  and  third  schematics  of  Fig.  5.  As  the  oxidation  proceeds,  and 
the  dimer  is  eventually  oxidized,  the  discrepancy  between  the  oxidation  depth  under  initially  (lxl) 
and  (2x1)  surfaces  may  be  expected  to  propagate  with  the  interface,  giving  rise  to  an  inherent 
intertacial  roughness  of  0.3  nm,  approximately  the  diameter  of  an  Si-0  grouping.  Oxidation  to  the 

final  condition  exhibiting  such  interfacial  roughening  is  depicted  in  the  lower  three  schematics  of 
Fig.  5. 

Silicon  Oxidation  Kinetics 

The  conceptual  model,  described  in  the  previous  section,  is  utilized  to  formulate  the  rate 
equations  that  adequately  describe  the  oxidation  mechanism  of  silicon  (100)  surface  in  a  dry 
oxygen  ambient.  According  to  the  model,  the  fundamental  chemical  reaction  occurs  via  classical 
Langmuir  kinetics.  A  major  difference  in  this  model  when  compared  to  existing  models  is  that  there 


is  a  production  of  an  intermediate  species  prior  to  the  final  reaction  product  SiCb.  At  this  juncture, 
this  intei mediate  product  is  termed  “SiO”.  Even  though  several  experimental  results  have 
confirmed  that  the  average  stoichiometry  of  the  structure  that  exists  in  the  vicinity  of  the 
silicon/oxide  interface  corresponds  to  SiO,  it  is  acknowledged  that  a  completely  rigorous  treatment 
would  require  a  more  extensive  experimental  and  theoretical  analysis  of  the  intermediate  product 
and  subsequently  more  detailed  series  of  equations  to  describe  the  chemisorption  process  at  the 
interface.  Adopting  the  Langmuir  kinetic  formalism,  the  chemical  reaction  sequence  for  the 
oxidation  of  silicon  may  be  written  as: 


:  1  ** 

Si  H —  O-  / — s  SiO 
2  '  *; 

1  k- 

Si0  +  -02—>Si02 


0) 

(2) 


It  should  be  noted  that  the  first  reaction  is  reversible  (i.e.  in  equilibrium),  but  the  second  is  not. 
The  existing  equilibrium  between  the  molecular  oxygen  chemisorbed  at  the  silicon/oxide  interface 
and  the  substrate  silicon  is  represented  by  the  first  reaction  sequence,  whereas  the  second  reaction 
sequence  describes  the  non-equilibrium  reaction  between  the  intermediate  species  and  the  additional 
chemisorbed  oxygen  to  yield  the  final  reaction  product,  silicon  dioxide,  SiC>2.  k^k^  and  L  are  the 
usual  reaction  rate  constants. 

Using  the  second  reaction,  the  rate  of  formation  of  the  final  product,  SiC>2,  may  be 
expressed  as, 

d[Si02] 

— — —  =  k^SiOJOj-  ,  (3) 


where  [Cb]  is  the  concentration  of  chemisorbed  molecular  oxygen  available  at  the  silicon/oxide 
interface  and  [SiO]  is  the  concentration  of  the  partially  oxidized  intermediate  species  at  the 
silicon/oxide  interface.  The  concentration  of  oxygen  at  the  interface  can  be  determined  by 


employing  the  traditional  diffusional  approaches  similar  to  those  applied  in  Deal-Grove  model  and 
most  other  current  reaction  models.  The  availability  of  oxygen  in  the  equilibrium  reaction  will  vary 
within  the  reaction  time  frame  due  to  diffusion  limitations.  Therefore  any  steady-state  assumptions 
would  not  be  valid  in  this  reaction  and  the  usual  equilibrium  relationships  will  not  be  considered. 
However,  absolute  expressions  for  reaction  rates  remain  valid  and  the  concentration  of  the 
intermediate  species  may  be  calculated  by 

=  k [S/][0,  f  ~ [ SiO } -  kn[SiO][0.  f  •  (4) 

at 


After  the  formation  of  a  thick  layer  of  oxidized  silicon,  the  concentration  of  the  intermediate 
product  SiO  remains  almost  a  constant.  In  other  words,  the  rate  of  change  of  [SiO]  is  zero.  i.e. 
d[SiO]/dt  =  0  in  the  long-time  limit.  In  the  long-time  limit,  (4)  gives 


[SiO]  = 


k^siiaf 

l(+k2[02f 


(5) 


Substituting  this  into  (3),  the  rate  of  formation  of  silicon  dioxide  film  can  be  expressed  as 

d[Si02]=k1k2[Sil02l 

dt  k[+k2[02f  '  ^ 

Let  N  be  a  conversion  factor  for  film  thickness  to  concentration  of  Si02,  analogous  to  that 
employed  in  the  Deal-Grove  model.  The  concentration  of  silicon  [Si]  is  a  constant  at  the 
oxide/silicon  interface.  The  growth  rate  can  be  rewritten  as, 

Ndx  ^2[S/][02] 

dt  k[+k2[02f 


(7) 


It  is  clear  that  the  oxide  film  growth  rate  is  solely  controlled  by  the  availability  of  oxygen  at 
the  oxide/silicon  interface.  Therefore,  the  only  crucial  variable  quantity  as  the  growth  time 
progresses  is  the  concentration  of  oxygen  [O2].  The  growth  rate  expression  (7)  was  obtained  from 
the  fundamental  reaction  rate  equation,  (6),  and  it’s  applicable  to  the  oxidation  of  silicon  in  a  dry 
oxygen  ambient.  However,  this  would  characterize  the  growth  only  in  the  thick  oxide  regime. 
Therefore,  it  is  precisely  this  function  that  has  to  be  modified  for  the  thin  transient  regime  (for  thin 
oxides). 

In  order  to  get  to  the  transient  regime,  (4)  is  multiplied  on  both  sides  by  the  integrating 
factor  el*  and  the  following  expression  is  obtained: 

=  kl[Si]{02feyt  -  (4  +  k2[02f)[Si0]e/'  .  (8) 

If  we  define  y  =  k[  +k2[02]y-,  then  we  can  rewrite  (8)  as 

-e v  =  k1[Si][0,fey‘  -  y[SiO Je*  .  (9) 

at 

Integrating  both  sides  of  (9)  gives 


[SiO]  = 


kr  +  k2[02f  1  > 


(10) 


To  obtain  the  oxide  film  growth  rate  in  the  transient  thin  oxide  regime,  we  substitute  this 
expression  for  [SiO]  into  (3)  to  get 


d[Si02  ] 


kxk2[Si][ 02  ]  / 
k[  +  k2  [  02  f1  \  €  ' 


dt 


(11) 


According  to  (11),  there  is  no  production  of  SiC>2  for  thin  oxides,  i.e.  for  small  times.  This  is  a 
very  important  factor  which  might  suggest  nucleation  and  island  formation  in  the  very  initial 

stages.  It  is  convenient  to  move  to  the  spatial  domain  from  the  temporal  domain  at  this  point. 
Therefore,  set  yt-x/a,  where  “a”  is  a  characteristic  length  and  “x”  is  the  oxide  film  thickness  and 

“f”  is  the  time  oxidation.  This  allows  the  growth  rate  to  be  rewritten  as 


kfaJSiUOJ  ( 

k;+k2[o2 


(12) 


Equation  (12)  relates  the  growth  rate  of  the  oxide  film  to  the  concentration  of  molecular  oxygen  at 
the  oxide/silicon  interface.  The  concentration  of  molecular  oxygen  at  the  interface  cannot  be 
experimentally  determined.  Therefore  it  is  necessary  to  derive  a  relationship  between  the  interfacial 
molecular  oxygen  concentration  and  the  surface  concentration  of  oxygen  molecules  (i.e.  at  the 
gas/oxide  boundary).  This  quantity  can  be  calculated  if  the  oxygen  partial  pressure  in  the  gas  phase 
is  known.  Employing  Fick’s  law,  along  with  Henry’s  law  near  the  gas/oxide  boundary,  and 
invoking  a  flux  picture  in  a  derivation  analogous  to  that  of  Deal-Grove,  it  is  possible  to  obtain  a 
relationship  that  relates  concentrations  of  oxygen  molecules  near  the  gas/oxide  and  the 
oxide/substrate  interfaces  [Figure  1].  The  steady-state  fluxes  are  given  by 

Fj  =h(C*  -C0), 

*V=  —  (Q-Q,  (13) 

F3=k2Q. 

The  steady-state  condition  dictates  continuity  of  fluxes.  Therefore  Fj=F2=Fj  ,  and  solving  for  C0 
and  C,  in  terms  of  the  other  variables  Deff,  h,  C*  and  x,  we  obtain 
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hD#C 


k,Deff+hDeff  +  xk,h 


D  C * 

ZfiZ 

Deff+kyx' 


since  h  »  ks.  Substituting  for  C,  from  (15)  into  (14)  gives 


C  =■ 


<sr 


A 


eff 


Deff  +  xh  Deff  +  xh 


f  D  C* 
ZfffZ 

\Deff+k,X  j 


(15) 


(16) 


Values  of  kj  and  Asunder  typical  thermal  oxidation  condiuons  have  been  previously  determined. 
At  1000°C  and  760  Torr.  of  oxygen,  k3= 3.6  X  10^  (im/hr  and  Deff=253l  |im2/hr  [26].  Using 
these  values,  the  following  approximations  may  be  made  for  oxidation  under  similar  conditions: 


and 


C :=c 


DtffC 
C  s — eJL- 


Deff  +  kyX 


(17) 


(18) 


Rewriting  (12)  using  these  forms  for  the  concentrations  of  oxygen  and  reformulating  the  constants, 
yields 


N^-  =  k,q  +  aC0e'x/\ 
at 


(19) 


and  substituting  for  C,  and  C0  gives 


N^  =  k, 


dt  3  Deff  +  k,x 


A„C  , 

— - - +  aC  e  /a. 


(20) 


where  a  is  the  initial  'ballistic’  reaction  rate  coefficient  which  is  explained  in  detail  later. 


Thus, 


dx  =  ^DeffC  |  aCT  e-ya 

dt  N[Deff  +  k3x\  N 


(21) 
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This  expression  for  dx/dt  may  be  compared  with  existing  experimental  data  on  the  oxidation  rate  of 
silicon  for  the  evaluation  of  this  model.  In  the  thin  limit  (<  20  nm),  this  accurately  predicts  the 
observed  enhancement  in  oxide  growth  rates  as  compared  with  the  rates  at  larger  oxide  thickness. 
Beyond  oxide  thickensses  of  20  nm,  the  relationship  gives  rates  that  are  comparable  with  those 
predicted  by  the  Deal-Grove  model  and  with  experimental  observations.  The  parameter  "a*  in  the 
exponential  term  is  a  characteristic  length  which  is  associated  with  the  onset  of  diffusional  effects 
of  oxygen  through  the  oxide  as  the  film  grows. 

In  the  thin  regime  (for  smaller  values  of  “x”),  the  Deal-Grove  model  reduces  down  to 


x  = 


On  the  other  hand,  (21)  reduces  to 


(22) 


N 


dt  = 


kA 


eff 


a 


e  X/“dx , 


(23) 


x  = 


(24) 


k 

Equation  (24)  gives  an  enhanced  growth  rate  if  —  <  l.In  other  words,  the  initial  oxidation  gives 

a 

enhanced  growth  rate  (for  x  — >  0)  if 


aC0  >  kA 


(25) 


Results  and  Discussion 


We  use  data  taken  from  Massoud  et  al.  [4],  Irene  and  Van  de  Meulen  [8],  Kamigaki  and 
Itoh  [29]  and  Chao  etal.  [30].  As  described  above,  this  data  is  initially  fit  in  the  thick  oxide  regime 
with  Deal-Grove.  Fig.  6  shows  the  linear  reaction  coefficient  k.3  and  the  diffusion  coefficient  D 
for  all  the  data  on  a  single  plot.  The  activation  energies  associated  with  these  two  parameters  are 
1.54  eV  and  2.58  eV,  respectively.  The  Si-Si  bond  energy  is  1.83  eV  [21]  and  the  closeness  of 
the  activation  energy  for  £3  to  this  value  suggests  that  the  controlling  mechanism  is  the  breaking  of 
he  Si-Si  bonds.  A  higher  activation  energy  in  the  diffusion-controlled  regime  explains  the 
decreased  growth  rate  in  that  regime  where  the  growth  rate  is  controlled  by  the  diffusivity  of  the 
oxidizing  species.  As  the  oxidation  temperature  is  raised,  the  oxidant  molecules  are  driven  deeper 
into  the  already  formed  oxide  before  they  react  at  the  interface  with  the  substrate  silicon.  Raising  of 
the  temperature  would  produce  substantial  thermal  vibration  to  open  up  the  oxide  network 
structure,  thus  enabling  the  oxidant  to  traverse  further  through  the  already  formed  oxide.  It  has 
been  reported  previously  [3]  that  the  activation  energy  associated  with  the  bulk  diffusion  of 
oxygen  in  fused  silica  is  1.17  eV.  This  is  far  smaller  than  the  value  found  here. 

Massoud’s  [4]  samples  contained  ca.  1.0  nm  of  oxide  prior  to  thermal  oxidation  whereas 
Irene  [8]  reported  the  presence  of  0.3  -  0.6  nm  of  oxide  before  the  wafers  were  introduced  into  the 
oxidation  chamber.  Kamigaki’s  [29]  samples  reportedly  had  a  0.6  nm  thick  oxide  layer  while 
Chao’s  [30]  ellipsometry  detected  a  1.6  -  2.0  nm  oxide  film  prior  to  oxidation.  Massoud,  Irene  and 
Chao  used  an  in-situ  ellipsometer  whereas  Kamigaki’s  samples  were  taken  out  of  the  oxidation 
furnace  in  order  to  perform  the  oxide  thickness  measurements  by  ellipsometer.  The  reported  initial 
oxide  thicknesses  agreed  quite  well  with  the  initial  oxide  thicknesses  found  in  fitting  the  Deal- 
Grove  model  (as  explained  in  the  section  above)  to  determine  the  £3  and  D  values.  All  the 
samples  were  lightly  doped.  All  the  experiments  were  performed  at  atmospheric  pressure. 
Following  the  first  evaluation,  (21)  is  now  used  to  fit  each  set  of  data,  but  with  £3  and  D 
evaluated  from  the  straight  lines  in  Fig.  6.  The  only  variable  is  the  parameter  a. 
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Figure  7  shows  the  oxide  thickness  as  a  function  of  oxidation  time  for  Massoud’s 
experimental  data  and  the  fits  to  the  model  (21).  The  best  value  of  the  characteristic  length  a  is 
found  at  each  oxidation  temperature.  The  value  of  the  characteristic  length  a  varies  between  0.28 
nm  and  1.0  nm  over  the  temperatures  800  -  1000  C.  In  Fig.  8,  we  show  the  comparison  between 
model  (21)  and  Irene’s  experimental  data.  The  fit  is  excellent  in  all  temperature  ranges.  The  range 
of  the  characteristic  length  a  is  0.3  - 1.0  nm.  Kamigaki’s  experiment  was  done  at  several  different 
pressure  ranges.  However,  the  comparison  of  Fig.  9  shows  only  the  growth  curves  for  the 
experiments  performed  under  atmospheric  pressure  conditions.  A  minimal  number  of  data  points 
in  Kamigaki’s  and  Chao’s  experiments  prevented  us  from  making  a  definitive  distinction  between 
reaction  controlled  and  diffusion  controlled  regimes  for  computation  of  the  activation  energies.  The 
value  of  a  again  ranged  from  0.15  to  0.30  nm  for  temperature  ranges  of  950  -  1100  C  in 
Kamigaki’s  experiments,  while  we  find  0.33  -  1.25  nm  in  Chao’s  experiments  (Fig.  10).  The  fits 
are  excellent. 

In  Fig.  11,  we  plot  the  values  of  the  characteristic  length  a  as  a  function  of  the  growth 
temperature.  The  average  value  of  a  is  ca.  0.35  nm,  which  is  approximately  the  height  of  a  Si-0 
molecule  at  the  Si/Si02  interface  for  the  temperatures  up  to  950  C.  However,  Kamigaki’s  data  is 
different.  In  the  latter  data,  nitrogen  was  used  to  dilute  the  oxygen  gas.  There  is  a  good  possibility 
that  the  nitrogen  molecules  would  have  initially  blocked  some  of  the  surface  sites,  preventing  a 
rapid  initial  growth  regime. 

The  reaction  controlled  regime  is  a  strong  function  of  the  initial  surface  structure  prior  to 
thermal  oxidation.  The  cleaning  processes,  surface  passivation  techniques  and  the  initial  native 
oxide  layer  thickness  are  some  of  the  major  factors  that  would  affect  the  value  of  the  characteristic 
length  at  the  very  fundamental  level.  Significantly  different  cleaning  and  surface  passivation 
techniques  were  used  by  the  various  researchers  whose  experimental  data  were  compared  in  this 
report.  They  also  had  considerably  different  thicknesses  of  native  oxide  layer  present  on  their 
samples  prior  to  thermal  oxidation  process.  Neveretheless,  the  values  of  a  agree  quite  well. 


The  rise  in  a  above  950  C  presumably  signals  a  more  reactive  interface  process  or  a 
structural  change  or  both.  We  postulate  three  different  reactions  that  may  occur  at  the  very 
beginning  of  the  oxidation  process.  The  parameter  a  may  very  well  characterise  the  thickness  of 
the  initial  self-limiting  oxide  layer.  At  very  low  oxidation  temperatures,  the  value  of  a  is  ca.  0.33 
nm  which  agrees  well  with  the  idea  that  the  top  layer  silicon  dangling  bonds  are  the  only  ones  that 
are  oxidized.  As  the  temperature  of  oxidation  is  raised  the  oxygen  molecules  may  have  enough 
kinetic  energy  to  penetrate  through  the  silicon  lattice  to  oxidize  the  back  bonds  and  account  for 
therise  in  a.  This  may  be  true  until  the  temperature  is  ca.  950  C.  It  has  been  reported  by  Irene, 
Tierney  and  Angiello  [33]  that  there  is  a  change  in  the  density  of  the  oxide  film  at  ca.  950  C.  They 
have  observed  a  3%  change  in  density  going  from  600  C  to  1 150  C.  Based  on  the  values  reported 
by  Irene  et  al.  [33],  2.28  -  2.20  gm/cm^,  comparisons  with  available  data  [23]  indicate  the 
formation  of  either  Tridymite  or  Cristobalite  forms  of  silica.  We  believe  [34]  that  this  change  in  the 
value  of  a  may  be  related  to  the  penetration  depth  of  the  initial  layer  of  oxide  formation  which  in 
turn  depends  on  the  structural  porosity  of  the  silicon  lattice  for  oxygen.  Higher  oxidation 
temperatures  may  also  enhance  the  porosity  of  the  silicon  lattice  by  opening  up  the  top  few  layers 
thereby  letting  the  initial  oxidation  to  continue  until  ca.  1.2  nm  of  oxide  is  formed.  However,  at 
this  stage  it  is  not  possible  to  conclude  what  form  of  silica  is  being  formed  at  these  temperatures 
and  whether  there  is  actually  a  structural  change  occuring  at  950  C  without  further  experimental 
evidence. 

The  fitting  parameter  a  is  kept  constant  so  that  %  >  1 .  This  parameter  describes  the  rate 

at  which  the  oxygen  molecules  penetrate  the  Si  2x1  reconstructed  surface  to  oxidize  the  upper 
most  layers  of  the  substrate.  A  typical  value  [34]  of  a  in  the  temperature  range  of  800  -  1000  C  is 
ca.  ~  106  |i/min.,  whereas  the  linear  reaction  rate  coefficient,  A'j,  is  ca.  ~  10^  p/min.  We  expect  a 
to  have  a  stronger  dependence  on  gas  phase  oxygen  pressure  rather  than  the  oxidation  temperature. 
Apparently,  in  this  very  initial  ballsitic  growth  regime,  for  the  lack  of  a  better  word,  the 
diffusivity  of  oxygen  species  does  not  play  a  major  role. 
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Conclusion 

A  model  has  been  proposed  for  dry  oxidation  of  silicon  which  invokes  dissociative 
chemisorption  in  silicon  at  the  interface  between  the  silicon  dioxide  film  and  the  substrate.  This 
model  supports  the  diffusing  species  in  such  processes  to  be  molecular  rather  than  atomic  oxygen. 
This  model  also  predicts  a  self-limiting  thickness  of  0.5  -  0.6  nm  in  oxidations  performed  at 
temperatures  sufficient  to  dissociate  surface  dimers  and  permit  oxygen  penetration  of  the  substrate 
beyond  the  first  monolayer  of  sub-oxide.  A  preliminary  examination  of  the  model  has  also 
suggested  a  mechanism  which  can  account  for  the  observation  that  oxide/silicon  interfaces  exhibit 
an  inherent  interfacial  roughness  of  approximately  one  atomic  diameter  or  about  0.3  nm  Kinetic 
rate  equations  have  been  developed  and  tested  against  experimental  observation  reported  in 
literature  and  found  to  give  an  excellent  fit. . 

We  have  compared  relevant  experimental  data  for  dry  oxidation  of  silicon  with  a  model  that 
invokes  dissociative  chemisorption  in  silicon  at  the  interface  between  the  silicon  dioxide  film  and 
the  substrate.  This  relationship  has  been  tested  against  significant  experimental  observations 
reported  in  the  literature  and  found  to  give  an  excellent  fit.  It  should  be  noted  that  insufficient  data 
points  in  Kamigaki’s  and  Chao’s  experiments  prevented  us  from  obtaining  definitive  activation 
energies.  Different  cleaning  procedures  and  surface  passivation  techniques  prior  to  oxidation 
mayalso  be  the  cause  of  the  significant  deviation  of  the  model  from  experimental  observation  in 
some  of  the  cases.  Dilution  of  the  oxygen  environment  by  nitrogen  is  postulated  to  play  a  crucial 
role  at  the  very  initial  stages  of  oxidation  by  blocking  some  of  the  surface  sites  preventing  a  rapid 
initial  growth.  The  value  of  a  is  believed  to  be  the  thickness  of  the  very  initial  layer  oxide  that  is 
formed  when  the  2x1  reconstructed  silicon  surface  is  exposed  to  an  oxygen  ambient.  This  might  be 
a  phenomena  to  be  analyzed  in  detail  experimentally  in  order  to  study  island  growth  modes  initiated 
at  the  very  initial  stages  by  blocked  surface  sites.  It  is  our  interpretation  that  the  initial  ballistic 
growth  regime  may  be  strongly  dependent  on  oxygen  pressure  rather  than  on  the  temperature. 
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Further  experimental  evidence  is  required  before  conslusive  remarks  could  be  made  on  the 
structural  evolution  of  the  initial  oxidation  phase  and  the  pressure  dependence  of  the  ballistic 
growth  regime. 
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Abstract: 

In  the  context  of  semiconductor  manufacturing,  chemical  vapor  deposition  (CVD)  denotes 
the  deposition  of  a  solid  from  gaseous  species  via  chemical  reactions  on  the  wafer  surface.  In 
order  to  obtain  a  realistic  process  model,  this  paper  proposes  the  introduction  of  an  interme¬ 
diate  scale  model  on  the  scale  of  a  die.  Its  mathematical  model  is  a  reaction-diffusion  equa¬ 
tion  with  associated  boundary  conditions  including  a  flux  condition  at  the  micro  structured 
surface.  The  surface  is  given  in  general  parameterized  form.  A  homoganization  technique 
from  asymptotic  analysis  is  used  to  replace  this  boundary  condition  by  a  condition  on  the 
flat  surface  to  make  a  numerical  solution  feasible.  Results  from  a  mathematical  test  problem 
are  included. 

Keywords: 

homogenization,  asymptotic  analysis,  finite  differences,  partial  differential  equations,  chem¬ 
ical  vapor  deposition,  chemical  engineering. 


INTRODUCTION 


To  model  chemical  vapor  deposition  (CVD)  in  single  wafer  reactors,  Attempts  have  been 
made  at  linking  reactor  scale  models  (RSM)  and  feature  scale  models  (FSM)  to  obtain 
realistic  simulation  results  [1].  In  these  studies,  reactor  scale  predictions  have  been  used  as 
inputs  to  feature  scale  models,  but  no  information  was  fed  back  from  the  feature  scale  to  the 
reactor  scale.  But  features  are  typically  arranged  in  clusters,  which  remains  unaccounted 
for  in  this  approach.  Also,  any  direct  combination  of  these  models  must  suffer  from  the  vast 
differences  in  length  scales  between  the  reactor  scale  (10-Im)  and  the  feature  scale  (10-6m). 

Therefore,  we  propose  the  introduction  of  a  mesoscopic  scale  model  (MSM)  on  the  scale 
of  a  die  to  remedy  these  problems.  A  schematic  is  shown  in  Figure  1.  To  obtain  an  integrated  Fi.tjur 
process  simulator,  MSMs  encompassing  several  clusters  of  features  each  are  introduced  at 
several  positions  on  the  wafer  bridging  the  length  scale  differences  between  the  reactor  scale 
and  the  feature  scale.  By  encompassing  several  feature  clusters,  a  MSM  also  accounts  for 
the  effects  of  varying  density  of  feature  clustering  and  of  cluster  spacing.  For  the  study  of 
these  feature-to-feature  effects,  the  MSM  can  also  be  used  in  a  stand-alone  mode. 

Mathematically,  the  domain  of  the  MSM  is  comprised  of  the  gas  phase  just  above  one  die 
on  the  wafer  surface.  Assuming  that  the  pressure  is  sufficiently  high,  the  model  consists  of 
a  reaction-diffusion  equation  with  associated  boundary  conditions  for  each  chemical  species 
in  the  gas  domain;  specifically  at  the  wafer  surface,  a  flux  condition  is  imposed.  It  is  this 
boundary  condition  that  makes  the  problem  numerically  challenging,  since  it  is  impossible 
even  for  a  die  scale  model  to  accurately  resolve  features  on  the  scale  of  the  micro  structure. 

We  solved  this  problem  using  a  homogenization  technique  from  asymptotic  analysis, 
which  allows  for  the  replacement  of  the  micro  structured  surface  by  a  flat  surface  by  taking 
into  account  the  increase  in  surface  area  inside  the  feature  clusters.  The  mathematical 
derivation  for  surfaces  that  can  be  expressed  in  functional  form  has  been  given  in  [3].  This 
paper  extends  the  derivation  to  arbitrary  surfaces  in  parameterized  form,  which  allows  for 
instance  for  overhangs  at  the  sides  of  the  features. 


DERIVATION 


Mathematically,  the  problem  is  given  as  a  reaction-diffusion  equation  with  associated 
boundary  conditions.  After  a  suitable  non-dimensionalization  procedure,  the  dimensionless 


concentration  u(x,  y)  has  to  satisfy  the  differential  equation 

■jj"  =  div„ ,(D(x,  y)  VTyu)  +  Rg(u ,  x,  y)  (1 ) 

as  well  as  the  boundary  conditions 

-ef(-DVxvu)  =  0  .t  =  0,  (2) 

eJ{-DVxya)  =  0  x  -  A',  (3) 

u  =  ctop(x)  y  =  Y,  (4) 

ut(-D  Vxyu)  =  S (u,  x,  y)  (x,  y)  €  fw.  (5) 


D  is  a  symmetric  positive  definite  diffusivity  matrix  and  t\  —  (l,0)r  the  first  unit  vector. 
rw  denotes  the  parameterized  wafer  surface  and  u  the  outer  unit  normal  vector  on  Tw. 

The  wafer  surface  rw  is  parameterized  with  the  macroscopic  variable  s  as 

(x,y)  =  {s  +  £Q:(s/£),£i3{s,s/£))  0  <  s  <  1.  (6) 

In  a  fully  periodic  surface,  the  surface  would  be  assumed  to  be  periodic  in  .s  with  period  s, 
where  0  <  e  1  is  a  “small”  quantity.  However,  the  surface  varies  on  both  the  macro  scale 
as  well  as  the  micro  scale;  this  fact  is  explicitly  modeled  by  the  dependence  on  the  slowly 
changing  (macroscopic)  variable  $  and  the  fast  changing  (microscopic)  variable  a  —  sjs, 
respectively.  In  the  definition  of  x,  ea(cr)  represents  then  a  microscopic  perturbation  of  s, 
which  allows  for  instance  for  overhangs  at  the  feature  sides  on  the  micro  scale.  In  y,  s.3($,a) 
models  the  microscopic  surface  height  depending  on  both  the  macroscopic  and  microscopic 
parameterization.  This  parameterization  allows  for  instance  for  overhangs  in  the  surface 
structure.  Since  the  surface  can  clearly  be  very  different  from  one  region  to  another  (macro- 
scopically),  periodicity  is  only  assumed  in  the  fast  changing  variable  a  (microscopically). 


Hence,  all  surface  related  quantities  are  assumed  to  be  periodic  with  period  1  in  the  fast 
changing  variable,  in  particular  the  surface  variables  a  and  3  satisfy 

cv(cr  +  1)  =  a(cr),  3(.s,  a  +  1)  =  3(s,  a)  for  all  0  <  s  <  1.  (7) 

This  really  means  that  adjacent  features  are  assumed  to  be  identical,  while  distant  features 
can  be  different.  It  is  assumed  that  the  parameterization  is  well-defined. 

The  idea  behind  the  homogenization  technique  to  be  used  is  the  elimination  of  depen¬ 
dence  on  the  microscopic  parameter  a.  To  this  end,  the  surface  representation  is  formally 
inflated  to  three  dimensions  depending  on  the  two  parameters  s  and  a  independently,  that 
is  no  relationship  between  s  and  a  is  assumed  now.  This  results  in  a  three-dimensional 
representation  of  the  wafer  surface  fw,  parameterized  by  $  and  a  independently: 

(.t,  y,  0  =  (s  +  ea{a)t  ep{s,  a),a  +  a(a))  0  <  s  <  1 , 0  <  a  <  1/s.  (8) 

For  this  surface,  a  homogenization  technique  like  in  [3]  is  used  to  find  the  appropriate 
problem  for  the  leading  term  of  the  bulk  solution  wq : 

^  =  divxy(D(x,  y)  Vxyw0)  +  Rg(w0,x,y)  (9) 

with  the  boundary  conditions 


—e[{—D  Vxywo)  =  0 

at  x  —  0, 

(10) 

e[(-DVxyw  0)  =  0 

at  x  =  A”, 

(11) 

O 

II 

at  y  =  Yt 

(12) 

el  (-D  Vxywo)  =  dS 

at  y  =  0, 

(13) 

where  a  is  given  by 

This  is  the  simplified  problem  on  a  rectangular  domain  that  can  be  efficiently  solved  by 
numerical  methods.  The  key  is  that  the  effect  of  the  micro  structured  surface  is  summarized 
into  a  macroscopic  correction  factor  in  the  flux  condition  at  the  wafer  surface. 


NUMERICAL  DEMONSTRATION 


To  demonstrate  the  method,  a  mathematical  test  problem  has  been  solved  with  the 
dimensionless  parameters  £  chosen  sufficiently  large  to  allow  for  a  classic  solution  by  full 
resolution  of  the  surface.  First,  the  results  denoted  by  the  solid  lines  in  Figures  2  and  3 
are  obtained  from  a  solution  of  the  simplified  problem  (9)— (13).  Second,  the  dotted  lines 
represent  the  solution  obtained  by  solving  the  original  problem  (l)-(5)  after  transforming 
its  domain  onto  a  rectangle. 

The  problem  uses  e  =  1/16  and  the  surface  function  y  =  h(x)  =  eh(x,x/e)t  h(x,  f)  = 

4.x (1  -  x)  sin(u;o£)  with  =  327re  on  the  unit  square,  hence  A*  =  1'  =  1. 

Figure  2  shows  the  concentration  levels  throughout  the  domain.  They  clearly  agree  Figure 
everywhere  except  in  a  region  close  to  the  surface,  where  oscillations  aie  intioduced  by 
actually  resolving  the  surface  structure.  Figure  3  shows  that  the  net  flux  into  the  surface 
predicted  by  the  asymptotic  solution  captures  the  average  of  the  true  flux.  Both  facts  Figure 
demonstrate  that  the  method  is  capable  of  modeling  the  quantities  relevant  to  the  interfaces 
with  the  reactor  scale  model  (via  the  concentration  levels  near  the  top)  and  the  feature  scale 
model  (via  the  average  flux). 

A  demonstration  of  the  capability  of  the  mesoscopic  scale  model  to  study  feature-to- 
feature  effects  for  a  physical  example  is  contained  in  [2]. 
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This  paper  discusses  a  model  designed  to  deal  with  pattern  dependencies  of 
deposition  processes.  It  is  a  mesoscopic  scale  model  in  the  sense  that  it  deals 
with  spatial  scales  on  the  order  of  10-3  m  to  10-2  m,  which  is  intermediate 
between  reactor  scale  and  feature  scale.  This  model  accounts  for  the  effects  of 
the  microscopic  surface  structure  via  suitable  averages  obtained  by  a  homog¬ 
enization  technique  from  asymptotic  analysis.  Two  studies  on  the  LPCVD  of 
silicon  dioxide  from  tetraethoxysilane  are  presented  to  demonstrate  the  meso¬ 
scopic  scale  model.  The  first  study  shows  the  effects  of  microloading  in  regions 
of  higher  feature  density.  The  second  study  shows  the  effects  of  varying  oper¬ 
ating  conditions  on  loading  and  introduces  a  generalized  Damkoehler  number, 
which  includes  information  about  the  surface  patterns,  to  quantify  the  degree 
of  transport  limitations.  Some  thoughts  on  how  this  model  can  be  used  to 
bridge  reactor  scale  and  feature  scale  models  are  presented. 


INTRODUCTION 


The  trend  towards  single  wafer  reactors  (SWR)  for  deposition  and  etch  processes  in 
the  microelectronics  industry  is  expected  to  continue  as  wafer  size  continues  to  increase. 
In  order  to  make  single  wafer  reactors  more  economically  attractive,  deposition  and  etch 
processes  are  being  run  at  high  rates  to  maintain  reasonable  throughputs. High  rates  can 
lead  to  nonuniformities  on  the  wafer  scale  because  of  depletion  of  reactants  and  transport 
limitations.  Reactor  scale  models  (RSM)  for  flow,  heat  transfer,  and  chemical  reactions  are 
well  developed  for  single  wafer  reactors  used  for  thermally  driven  chemical  processes  and 
susceptor  based  heating.  In  fact,  reactor  scale  modeling  and  simulation  have  been  used  to 
help  design  reactors  and  establish  operating  conditions  which  provide  acceptable  wafer  state 
uniformities  for  specific  processes  [1]. 
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Although  not  central  to  the  purpose  of  this  paper,  the  single  wafer  reactor  considered 
is  assumed  to  be  radially  symmetric  and  in  stagnation  point  flow.  The  wafer  rests  on  a 
heated  susceptor  in  the  center  of  the  reactor  chamber,  and  the  reactant  gases  are  introduced 
through  a  shower  head  at  the  top.  A  schematic  cross-section  with  a  rough  sketch  of  the  flow 
pattern  is  shown  in  Figure  1. 

High  deposition  rates  can  also  lead  to  nonuniformities  on  the  feature  scale,  even  in  the 
absence  of  wafer  scale  nonuniformities;  i.e. ,  the  film  thickness  may  not  be  uniform  inside 
features.  This  loss  of  conformality  is  well  understood  and  feature  scale  models  (FSM)  and 
simulators  are  fairly  well  developed  for  thermally  driven  deposition  processes  [2],  Neverthe¬ 
less,  the  predictions  of  feature  scale  simulators  depend  on  the  species  fluxes  into  the  feature 
from  the  source  volume.  In  general,  these  fluxes  cannot  be  obtained  from  experiments.  A 
complete  model  for  a  deposition  process  would  predict  local  wafer  state  based  on  reactor  set 
points. 

Attempts  have  been  made  to  combine  reactor  scale  and  feature  scale  models  to  obtain 
more  realistic  simulation  results  [3,  4].  In  those  studies,  the  models  were  used  sequentially. 
Namely,  species  concentrations  predicted  by  the  reactor  scale  model  are  used  to  compute 
the  species  fluxes  to  the  wafer  surface,  which  are  used  as  input  for  a  representative  feature 
at  a  particular  position  on  the  wafer.  No  information  was  fed  back  from  the  feature  scale 
to  the  reactor  scale,  e.g.  by-product  concentrations  were  computed  by  the  reactor  scale 
model  assuming  a  flat  surface  and  then  used  as  input  for  the  feature  scale  model  applied  to 
a  representative  feature.  This  approach  does  not  allow  for  the  effects  of  feature  density  to  be 
taken  into  account.  Also,  any  direct  combination  of  these  models  must  suffer  from  the  vast- 
differences  in  length  scales  between  the  reactor  scale  (10_1m)  and  the  feature  scale  (10~6m). 

High  rates  can  also  lead  to  pattern  dependencies;  i.e.,  local  deposition  rates  might  depend 
upon  the  local  pattern  density.  In  order  to  account  for  pattern  dependencies,  we  present  a 
mesoscopic  scale  model  (MSM)  on  the  scale  of  millimeters.  To  obtain  an  integrated  process 
simulator,  it  is  envisioned  that  the  MSM  will  be  applied  over  selected  areas,  each  encompass¬ 
ing  several  clusters  of  features  on  the  wafer.  For  each  of  the  MSMs,  there  w'ould  be  several 
FSMs,  each  representing  a  typical  feature  in  one  of  the  feature  clusters.  See  Figure  1  for 
an  example  arrangement.  In  this  way.  the  MSM  would  bridge  the  length  scale  differences 
between  the  reactor  scale  and  the  feature  scale.  By  encompassing  several  feature  clusters, 
the  MSM  also  accounts  for  the  effects  of  varying  density  of  feature  clustering  and  of  cluster 
spacing  to  study  feature-to-feature  effects.  To  demonstrate  these  capabilities,  the  MSM  can 
also  be  used  in  a  stand-alone  mode,  as  in  this  paper. 

Future  papers  will  deal  with  the  interchange  of  information  between  the  different  models. 
It  is  clear  that  the  details  of  the  film  profile  will  be  provided  by  a  FSM,  since  it  resolves 
the  length  scale  of  an  individual  feature.  Since  the  particle  flow  on  the  mesoscopic  scale  is 
much  faster  than  the  surface  growth,  it  is  justified  to  treat  the  surface  as  fixed  in  time  for 
the  solution  of  the  MSM.  The  surface  would  then  be  updated  periodically,  as  appropriate 
to  the  time  scale  of  the  surface  growth,  by  the  FSM  using  flux  data  provided  by  the  MSM. 
Therefore,  since  this  paper  presents  the  MSM  in  stand-alone  mode  to  demonstrate  its  basic 
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functionality,  the  surface  is  assumed  given  and  fixed  in  time.  Similarly,  it  is  assumed  that 
the  information  at  the  gas-phase  interface  is  supplied  from  a  RSM. 

Mathematically,  the  domain  of  the  MSM  is  comprised  of  the  gas  phase  just  above  one  or 
moie  feature  clusters  on  the  wafer  surface.  Assuming  that  the  pressure  is  sufficiently  high, 
t  e  model  consists  of  a  reaction-diffusion  equation  with  associated  boundary  conditions  for 
each  chemical  species  in  the  gas  domain;  specifically  at  the  wafer  surface,  a  flux  condition 
is  imposed.  It  is  this  boundary  condition  that  makes  the  problem  numerically  challenging, 
since  it  is  unreasonable  even  for  a  mesoscopic  scale  model  (on  the  scale  of  millimeters)  to 
accurately  resolve  the  patterns  on  the  scale  of  the  features. 

We  use  a  homogenization  technique  from  asymptotic  analysis  to  replace  the  patterned 
surface  by  a  flat  surface  by  taking  into  account  the  increase  in  surface  area  inside  the  feature 
clusters.  The  approach  rests  on  the  observation  that  patterned  surfaces  can  be  viewed  as 
possessing  two  length  scales,  one  macroscopic  length  scale  resolving  the  variations  from  one 
cluster  to  the  next  and  one  microscopic  scale  resolving  the  changes  from  one  feature  to  the 
next.  The  fundamental  idea  of  our  approach  is  to  separate  the  effect  of  the  two  scales  then 
average  over  the  microsopic  scale  while  retaining  the  influence  of  macroscopic  variations  The 
averaging  is  based  on  the  mathematical  assumption  of  periodicity  in  the  microscopic  (but  not 
in  the  macroscopic)  variable;  this  corresponds  to  the  physical  assumption  that  all  features  in 
one  cluster  are  identical  (on  the  microscopic  scale),  while  allowing  for  variations  between  the 
c  usters  (on  the  macroscopic  scale).  In  a  global  process  simulator,  this  assumption  means 
that  there  has  to  be  one  feature  scale  model  corresponding  to  each  cluster  of  features  which 
are  considered  identical. 


This  paper  is  divided  into  two  mathematical  and  three  application  sections.  In  the  next 
section,  the  mathematical  derivation  is  sketched  for  surfaces  given  in  functional  form  for 
simplicity;  the  full  mathematical  derivation  has  been  given  in  [5],  The  extension  to  param¬ 
eterized  surfaces  (allowing  overhangs  on  the  feature  sides)  is  contained  in  [61.  Following 
that,  the  numerical  method  used  in  the  simulations  is  presented.  The  first  of  the  application 
sections  introduces  the  example  chemistry  of  silicon  dioxide  deposition  from  TEOS  After¬ 
wards,  two  sets  of  results  are  presented  with  the  implementation  used  in  stand-alone  mode 
First,  the  effect  of  varying  feature  density  inside  clusters  for  one  set  of  operating  conditions 

is  analyzed.  Second,  a  study  of  varying  operating  conditions  for  one  surface  example  is 
presented.  r 


THE  MATHEMATICAL  MODEL 


e  doma'n  for  the  mathematical  problem  is  chosen  to  encompass  several  feature  clusters 
and  to  extend  into  the  gas-phase.  For  the  purposes  of  the  presentation  here,  the  species 
concentrations  just  above  the  wafer  surface  are  assumed  to  varv  slowlv  in  space  laterally 
and  the  features  are  assumed  to  be  infinite  trenches.  However,  [5]  contains  a  general  three- 
lmensional  derivation.  The  pattern  on  the  wafer  is  the  only  rough  surface  and  is  given  by 
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the  function  y  =  h(x).  The  coordinate  system  is  chosen  with  x  ranging  from  0  to  A'  along 
the  wafer  surface  and  y  ranging  from  0  to  Y  perpendicular  to  the  surface.  Figure  2  shows 
an  example  of  how  four  feature  clusters  might  be  introduced. 

For  the  examples  considered  in  this  paper,  the  pressure  is  high  enough  (Knudsen  number 
low  enough  <  0.01)  that  the  dimensionless  problem  for  the  flow  of  a  gaseous  species  in  and 
close  to  the  boundary  layer  above  the  surface  is  given  by  a  reaction-diffusion  type  equation. 
This  equation  reads  for  one  species 


dc 

dt 


=  -divF  +  Rg(c,  x,  y),  F  =  -D(x,  y)  Vc, 


(1) 


where  c(x.  y)  denotes  the  molar  concentration  and  F(x.  y)  the  associated  species  flux,  where 
the  dependence  on  time  t  is  suppressed  in  the  notation  for  compactness.  D  is  the  diffusivity 
matrix  for  gaseous  species  in  the  mixture  and  Rg  the  gas-phase  reaction  term.  On  both  sides 
of  the  domain,  no  flux  conditions  are  used 


-e\  •  F  =  0 
e\  •  F  =  0 


at  x  =  0, 
at  x  =  X. 


(2) 

(3) 


where  e\  =  (1,0)T  denotes  the  first  unit  vector  corresponding  to  the  x-direction.  The 
boundary  condition  at  the  gas-phase  interface  is  given  by  the  known  function  ctop(x),  which 
represents  a  (trial)  solution  of  the  reactor  scale  model,  as 


c  -  ctop(x)  at  y  =  Y 


(4) 


Along  the  wafer  surface,  the  flux  is  given  as  a  function  of  the  species  generation  rates  on  the 
surface,  namely 

v  •  F  =  S(c,  x ,  y)  at  y  =  h(x),  (5) 

where  u  is  the  unit  outward  normal  vector.  The  complete  problem  is  then  given  by  (1) 
through  (5)  together  with  the  definition  of  the  domain  in  Figure  2.  Notice  that  while  the 
appropriate  resolution  of  the  differential  equation  (1)  does  not  pose  any  major  challenges  for 
modern  computers,  the  proper  resolution  of  boundary  condition  (5)  for  patterns  consisting 
of  several  thousand  features  is  still  unreasonable. 


For  the  purposes  of  the  asymptotic  analysis,  the  surface  function  is  written  as  a  function  of 
the  two  variables  x  for  the  macroscopic  changes  and  f  for  the  microscopic  changes.  Here,  £  is 
some  chosen  dimensionless  small  parameter  of  the  problem.  For  instance,  denote  the  average 
initial  feature  width  by  xw,  and  let  the  space  between  features  be  xs.  Using  xp  =  xw  -t-  xs 
for  shorthand,  then  e  is  chosen  as 

Xn 

'=  =  ~X'  <6> 

where  A'  is  the  width  of  the  domain,  again.  Indeed,  we  use  xw  =  xs  =  xp/2  for  simplicity. 
Notice  that  for  a  small  0<£<1,  £  =  f  changes  much  more  rapidly  than  x  itself.  With 
this  idea,  the  surface  function,  which  is  itself  of  microscopic  magnitude,  can  be  written  as 


h(x)  =sh(x,  f). 


(7) 
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For  the  mathematical  analysis,  this  function  h(x.£)  that  explicitly  distinguishes  the  effect  of 
both  length  scales  is  assumed  periodic  in  £  with  period  1.  With  the  definition  for  c  above, 
this  means  that  the  patterns  on  the  wafer  surface  are  locally  periodic  with  period  xp  in  the 
macroscopic  variable  x ;  this  assumption  is  local  in  the  sense  that  the  definition  of  h(x ,  £) 
still  allows  for  variations  resulting  from  the  dependence  on  the  macroscopic  variable  x.  In 
physical  terms,  neighboring  features  are  assumed  to  be  identical,  but  clusters  are  allowed  to 
differ  from  each  other.  Note  that  the  normal  vector  in  equation  (5)  computes  to 


(8) 


for  this  surface  function. 


A  homogenization  technique  from  asymptotic  analysis  is  now  applied  to  separate  the  two 
length  scales  using  the  representation 

x  y 

c(.r,  y)  =  w0(x,  y)  +  w(x, (9) 

In  this  representation,  we  are  interested  in  computing  the  outer  solution  tho  in  the  bulk  of 
the  domain.  It  depends  only  on  the  macroscopic  variables  x  and  y.  The  correction  term  is 
assumed  to  vanish  except  inside  the  boundary  layer  close  to  the  surface,  that  is  we  assume 
that  w{x.  £,  rj)  approaches  zero,  as  rj  approaches  infinity,  for  all  x  and  £.  Since  this  function 
is  surface  related,  it  is  also  assumed  to  be  periodic  in  the  microsopic  variable  £  with  period 
1,  that  is  w(x,  £  +  1, 77)  =  w(x,£,  77)  for  all  x  and  77. 

Letting  e  go  to  zero  shows  that  indeed  w(x,{;,  77)  is  of  order  £  outside  the  boundary  layer. 
Moreover,  a  simplified  boundary  condition  at  the  wafer  surface  can  be  derived  that  only 
depends  on  the  macroscopic  variables.  The  full  problem  for  the  bulk  solution  wo  reads  then 

d'uj 

-7^-  =  -divF0  +  Rg{w0,  x,  y).  F0  =  -D(x,  y)  Vu'0  (10) 

with  the  boundary  conditions 

-ei  •  F0 
ei  •  F0 
0 

~6o  •  Fo 

where  eo  =  (0,  l)r  denotes  the  second  unit  vector  corresponding  to  the  y-direction  and  where 
a(x)  is  given  for  all  x  by 

=  Jo  M  <%,  ^0  =  ( 

Notice  that  condition  (11)  holds  now  along  the  flat  surface  y  =  0.  Therefore,  a  numerical 
solution  of  the  reaction-diffusion  problem  given  by  (10)  through  (12)  becomes  tractable,  since 
there  are  no  patterns  that  need  to  be  resolved.  All  effects  of  the  microscopic  variations  of 
the  surface  have  been  absorbed  into  the  macroscopic  function  a(x). 


0 

0 

ctop(x ) 

a(x)  S(wq.  x,  y) 


at  x  =  0, 
at  x  =  A*, 
at  y  =  Y. 
at  y  =  0, 


0 


THE  NUMERICAL  METHOD 


The  numerical  solution  of  the  equivalent  problem  given  by  (10)  through  (12)  uses  the 
conservative  formulation  of  the  associated  dimensionless  problem  transformed  to  the  unit 
square.  Second  order  centered  differences  in  the  spatial  variables  are  used  throughout,  while 
the  implicit  Euler  method  is  used  to  discretize  the  time  variable.  First,  a  pseudo  steady-state 
solution  at  the  initial  time  is  computed.  The  system  of  nonlinear  equations  (arising  from 
the  spatial  discretization)  is  solved  by  a  fixed-point  iteration  using  a  relaxation  technique 
for  improved  stability  whenever  needed.  Then,  solutions  at  later  times  are  computed  by 
discretizing  the  transient  problem.  Again,  the  system  of  nonlinear  equations  is  solved  by  a 
fixed-point  iteration  with  relaxation.  A  solution  is  accepted  if  it  differs  from  the  previous 
iterate  by  less  than  10-5  in  the  rms-norm.  Since  the  surface  is  assumed  time-independent  for 
this  paper,  the  pseudo  steady-state  solution  at  the  initial  time  suffices.  The  numerical  grid 
uses  100  points  in  the  ^-direction  along  the  wafer  surface  and  50  points  in  the  {/-direction 
perpendicular  to  the  surface. 


THE  PHYSICAL  EXAMPLE 


The  example  chosen  for  numerical  demonstration  is  thermally  induced  deposition  of  sil¬ 
icon  dioxide  from  tetraethoxysilane  (TEOS)  on  silicon  wafers  with  oxygen  as  an  inert  gas. 
Although  more  complex  kinetic  models  for  this  deposition  system  have  been  proposed,  for 
instance  in  [7],  we  use  the  model  detailed  by  Adams  and  Capio  [8];  this  simple  chemistry  is 
chosen,  because  it  involves  a  single  species  (TEOS)  and  suffices  to  demonstrate  the  method. 
The  stoichiometry  is  taken  to  be 


TEOS  — >  SiO?  +  by-products.  (13) 

The  reaction  rate  expression  for  the  sole  solid  species  Si02  is  modeled  as 


E>  —  U  (  ^a\  (-PtEOs)0'0 

Sl°2  1  6XP  (  RT)  1  + /c2(PTEOS)0.5 

with  the  coefficients 

k\  =  37.55  mol  /  (s  cm2), 
ko  =  0.25  1  /  y/torr, 

Ea  =  46.5  kcal  /  mol, 

(14) 


(15) 

(16) 

(17) 


where  Ea  is  the  activation  energy  reported  in  by  Adams  and  Capio  [8]  and  where  the  coeffi¬ 
cients  ki  and  ko  have  been  obtained  by  fitting  a  curve  to  the  deposition  rate  plot  in  Figure  2 
of  [8].  Furthermore,  R  denotes  the  universal  gas  constant,  and  T  is  the  ambient  temperature. 
For  this  single-component  demonstration,  the  net  flux  of  TEOS  to  the  surface  is  given  by 


S(c.X,t )  —  RsiOil-PrEOs)) 


(18) 
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where  the  partial  pressure  of  TEOS  is  computed  from  the  molar  concentration  of  TEOS  via 
Pteos  =  {cteos /ctotai)Ptotai-  It  is  assumed  that  there  are  no  gas-phase  reactions,  that  is 
Rg  =  0  in  (1).  The  remaining  operating  conditions  are  the  ambient  temperature  T  and  the 
total  pressure  Ptofa/,  both  of  which  together  determine  the  total  concentration  ctotai  via  the 
ideal  gas  law.  The  TEOS  concentration  at  the  gas-phase  interface  is  obtained  using  again 
the  ideal  gas  law  for  a  partial  pressure  of  TEOS  held  constant  at  0.20  torr. 


The  procedure  used  to  estimate  diffusivities  is  adopted  from  Reid,  Prausnitz,  and  Pol¬ 
ing  [9,  page  587]  as 

'3.03  -  420 10-3T3/2 

s _ vgJ/ _ 

Ptotal  & D,ij 

with  the  collision  integral  Qo,ij  as  suggested  by  Neufeld,  Janzen,  and  Aziz  [10] 


Dij 


-0 D.ij  ~ 


1.06036 


+ 


0.19300 


+ 


1.03587 


+ 


1.76474 


15610  exp(0.47635Tij)  exp(1.529967y)  exp  (3. 8941 17V,) ' 


where 


(19) 


(20) 


mu  = 


&ij 

_ 

K 


T  ■ 

ij 


77li  TUj 

<Ji  +  <Jj 


f2 

K  K  ‘ 


eg 

K 


(21) 

(22) 

(23) 

(24) 


The  variables  used  here  are  summarized  in  Table  1.  For  the  approximation  of  the  Lennard- 
Jones  parameters  Oi  and  e {/k  following  [9]  as 


1.18  •  (u,-)1/3, 
-  =  1.15  -Ti, 


Oi 

c,: 


(25) 

(26) 


the  liquid  molar  volume  V{  and  the  normal  boiling  point  Ti  of  species  i  are  required.  Their 
values  for  TEOS  have  been  taken  from  the  material  safety  data  sheet  (MSDS)  and  for 
oxygen  (O2)  from  appendix  A  (no.  59)  in  [9].  For  convenience,  these  values  are  summarized  in 
Table  2.  Finally,  the  diffusivity  Dim  of  species  i  in  a  multicomponent  mixture  is  approximated 
as  suggested  by  Wilke  [11] 

Am  =  =-~  4  ■  (27) 


^jri  Dij  . 


where  the  mole  fraction  is  computed  by 


Xi  = 


52  j  cj 


(28) 
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Then  the  diffusivity  matrix  in  equation  (1)  is  given  by 


D  = 


Dim  0  \ 

0  Dim  ) 


(29) 


for  i  =  TEOS. 


For  demonstration  purposes,  a  mesoscopic  domain  length  of  A'  =  8  mm,  encompassing 
four  feature  clusters  of  length  1  mm  each,  was  chosen.  The  domain  was  chosen  to  extend 
into  the  gas-phase  to  a  value  of  Y  =  0.8  mm.  The  individual  features  are  infinite  trenches 
of  feature  aspect  ratio  2  with  initial  width  1  pm  and  a  distance  of  1  pm  between  features: 
hence,  the  small  scale  period  within  the  feature  clusters  is  xp  =  2  pm.  At  this  spacing,  there 
are  500  features  per  cluster.  The  dimensionless  small  scale  parameter  e  is  chosen  as  the  ratio 
e  =  Xp/X  =  0.25  •  10-3. 

For  the  purposes  of  this  demonstration,  this  structure  is  approximated  by  an  explicitly 
given,  smooth  function.  The  dimensional  form  of  the  surface  function  is  given  by 

y  _  ^x\  _  f  a  +  a  cos (2 Tr^-j  if  x  is  inside  a  feature  cluster, 

(2a  if  x  is  inside  a  fiat  area 

with  constant  amplitude  a  =  Axp/A  =  1  pm,  where  .4  =  2  is  the  feature  aspect  ratio 
of  the  trenches.  Figure  2  shows  an  example  of  this  function  with  a  larger  xp  chosen  for 
better  visibility.  As  an  example  for  greater  feature  density,  1000  features  per  cluster  can  be 
approximated  by  replacing  2tt  by  4w  in  the  argument  of  the  cosine.  Their  small  scale  period 
is  then  xp  =  1  pm.  Maintaining  the  amplitude  a  =  1  pm,  their  feature  aspect  ratio  is  then  4. 

The  ratio  of  total  area  over  flat  area  used  here  has  been  chosen  for  demonstration  purposes, 

but  is  considered  of  interest  to  advanced  manufacturing  processes. 

Using  the  definition  of  e  as  xp/X.  a  non-dimensionalization  procedure  with  reference 
length  A'  is  applied  to  yield  a  definition  for  the  function  h(x,£)  with  £  =  x/e.  From  this 

result,  the  coefficient  d(x)  =  /„*  ||i/0||2  with  uQ{x,  f)  =  (f|,  -l)T  is  computed  via  numerical 
integration  using  the  trapezoidal  rule. 


THE  EFFECTS  OF  FEATURE  DENSITY  IN  CLUSTERS 

This  section  demonstrates  the  capability  of  the  mesoscopic  scale  model  to  study  feature- 
to-feature  effects  on  the  scale  of  several  feature  clusters,  i.e.,  how  varying  feature  density 
inside  clusters  affects  the  concentration  levels  and  the  species  fluxes  at  the  wafer  surface. 
The  conditions  are  as  specified  in  the  previous  section;  specifically,  the  total  pressure  is 
5  torr,  and  the  temperature  is  1000  I\,  while  the  partial  pressure  of  TEOS  at  the  top  of  the 
domain  is  fixed  at  0.2  torr.  For  Case  1,  the  wafer  surface  is  taken  to  encompass  four  clusters 
of  500  features  as  described  in  the  previous  section,  while  Case  2  uses  two  clusters  of  1000 
features  and  two  of  500  features,  each.  In  both  cases,  the  steady-state  solution  for  these 
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time-independent  surfaces  are  computed  by  the  method  detailed  above.  Preliminary  results 
have  been  shown  in  [12];  however,  an  unrealistically  large  value  for  a  was  used  in  that  work. 

Figures  3  and  4  show  the  concentration  profiles  throughout  the  two-dimensional  domain. 
As  expected,  the  concentration  levels  are  lower  closer  to  the  wafer  surface  than  at  the  gas- 
phase  interface  at  the  top.  They  are  also  lower  inside  the  feature  clusters  than  in  the  fiat 
areas  in  between  them;  this  is  explained  by  the  increase  in  surface  reactions  due  to  the 
increase  in  surface  area  available  for  reactions  inside  the  feature  clusters.  It  can  be  observed 
in  Figure  4  that  the  concentration  levels  in  the  denser  clusters  are  lower  than  in  the  others. 
However,  the  concentration  levels  in  the  right-hand  side  of  the  domain  (encompassing  the 
clusters  with  500  features)  are  equal  in  both  cases;  this  can  be  verified  by  the  contour  plots 
for  both  solutions  in  Figure  5,  where  the  solid  lines  correspond  to  Case  1  and  the  dashed 
lines  to  Case  2. 

To  allow  a  more  quantitative  assessment  of  the  solution  properties,  the  species  fluxes 
of  TEOS  into  the  wafer  surface  are  shown  in  Figure  6.  This  is  one  of  the  most  important 
quantities  for  the  purpose  of  interfacing  with  a  feature  scale  model  as  well  as  to  characterize 
the  quality  of  the  solution  method.  Again,  the  solid  line  corresponds  to  Case  1  and  the  dashed 
line  to  Case  2.  The  graphs  show  the  net  fluxes  per  flat  wafer  area ;  these  are  the  fluxes  that 
are  seen  by  the  macroscopic  scale.  The  right-hand  side  of  the  boundary  condition  on  the 
wafer  surface  a(x)  S(c,x ,  0)  in  (11)  is  plotted  versus  the  macroscopic  position  along  the  wafer 
surface  x.  As  a  basic  observation,  the  flux  of  the  depleted  species  TEOS  into  the  surface  is 
higher  inside  the  feature  clusters  than  in  the  flat  areas;  the  transition  is  discontinuous,  since 
the  transition  from  flat  area  to  the  clusters  is  abrupt  as  well.  This  effect  is  commonly  known 
as  microloading.  Quantitatively,  the  flux  inside  the  feature  clusters  is  larger  by  a  factor  of 
about  2.3  than  in  the  flat  areas,  corresponding  to  the  value  of  a  there.  Observe  that  the  flux 
decreases  in  the  interior  of  each  individual  feature  cluster  as  compared  to  the  edges  of  the 
cluster.  This  reflects  the  fact  that  in  the  center  of  a  cluster,  the  least  amount  of  TEOS  is 
available  for  reaction  and  resulting  deposition  as  compared  to  the  edges,  where  more  TEOS 
is  available  by  diffusion  from  the  flat  areas;  the  transition  between  these  areas  is  continuous, 
since  the  concentration  profile  changes  smoothly.  Finally,  Figure  6  also  shows  that  the  flux 
of  Case  2  is  approximately  twice  as  high  in  the  clusters  with  1000  features  than  in  the  others. 
This  corresponds  again  to  the  higher  value  of  the  factor  a  being  4.2  in  Case  2  as  compared 
to  2.3  in  Case  1,  each  inside  the  feature  clusters. 

In  summary,  the  effect  of  the  feature  density  increase  in  some  of  the  feature  clusters 
increases  the  depletion  of  reactants  in  those  areas  due  to  an  increase  in  available  surface 
area.  This  shows  that  the  method  predicts  the  concentration  levels  and  species  fluxes  and 
accounts  for  the  variations  from  one  feature  cluster  to  the  next  on  the  macroscopic  scale. 
The  method  is  also  capable  of  representing  the  variations  inside  a  feature  cluster.  The  scale 
of  these  variations  is  clearly  still  far  above  the  microscopic  scale  of  an  individual  feature,  but 
is  one  order  of  magnitude  below  that  of  the  macroscopic  scale  of  the  overall  die;  note  that 
predictions  on  the  scale  of  several  hundred  features  are  not  reasonably  obtainable  using  a 
feature  scale  model. 
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THE  EFFECTS  OF  OPERATING  CONDITIONS 


As  another  application  of  the  mesoscopic  scale  model,  this  section  presents  a  study  of  the 
effects  of  varying  operating  conditions  in  the  reactor.  Two  parameters  are  varied  throughout 
a  slightly  larger  operating  window  than  presented  in  [8].  The  total  pressure  throughout  the 
reactor  is  varied  from  1  torr  to  9  torr,  and  the  temperature  is  varied  from  900  Iv  to  1100  K. 
The  partial  pressure  of  TEOS  at  the  top  is  again  fixed  at  0.2  torr  as  are  all  other  parameters. 
In  Figures  7  through  11,  results  for  the  surface  of  Case  1  (four  clusters  with  500  features 
each)  are  marked  by  a  solid  line  with  circles  at  the  data  points,  while  Case  2  (two  clusters 
with  1000  features,  two  with  500  features)  is  marked  by  dashed  lines  with  crosses  at  the  data 
points;  in  addition  to  these  two  cases,  a  Case  0  representing  results  for  a  flat  wafer  is  shown 
by  a  dotted  line  with  stars  at  the  data  points.  The  results  in  Figure  8  are  shown  for  Case  1, 
only;  proportional  results  have  been  observed  for  the  other  cases.  Note  that  the  operating- 
conditions  used  in  the  previous  section  lie  at  the  center  of  the  operating  window  considered 
here. 


Figure  7  shows  the  average  deposition  rate  per  total  area  versus  total  pressure  with  the 
temperature  parameterized;  this  is  the  actual  depostion  rate  observed  on  the  surface  of 
the  wafer.  The  plot  shows  that  for  all  cases  the  deposition  rate  grows  as  the  temperature 
increases,  which  is  clear  from  the  rate  expression  (14).  It  shows  also  that  the  deposition 
rate  decreases  as  the  total  pressure  increases.  This  can  be  explained  by  the  observation 
that  the  diffusivity  is  inversely  proportional  to  the  total  pressure,  see  equation  (19);  hence, 
the  transport  of  the  reactant  species  to  the  surface  becomes  limiting,  as  the  total  pressure 
increases.  Moreover,  the  deposition  rate  is  higher  for  flat  wafers  (dotted  line  for  Case  0)  than 
for  patterned  wafers  and  lower  for  higher  density  patterns  (dashed  line  for  Case  2)  than  fol¬ 
lower  density  patterns  (solid  line  for  Case  1). 


Figure  8  plots  the  fractional  difference  in  the  flux  versus  total  pressure  with  the  temper¬ 
ature  parameterized.  This  difference  is  defined  as 


F iiff  = 


F  —  F  ■ 

x  max  x  mtn 
1  mm 


(31) 


where  Fmax  =  maxx (a(x)  S(c,  x,  0)),  and  Fmin  is  defined  analogously.  This  means  that  Fmin 
really  measures  the  flux  level  at  the  flat  areas  of  the  surface  (see  Figure  6  for  instance),  hence, 
Fmax  ~  Fmin  measures  the  absolute  increase  in  flux  inside  the  feature  clusters  compared  to 
the  one  associated  with  the  flat  areas.  Fdi f  j  is  then  a  measure  for  the  size  of  the  relative 
loading  increase  under  the  given  operating  conditions.  It  is  expected  that  the  concentration 
gradients  will  increase  with  increasing  temperature,  since  the  reactivity  increases  faster  than 
the  diffusivity.  With  increasing  total  pressure,  we  also  expect  transport  limitations  to  restrict 
reactant  diffusion  and  increase  the  size  of  the  gradients.  In  both  cases,  the  measure  Fdiff 
will  decrease,  since  the  flux  to  the  areas  with  higher  feature  density  will  decrease  faster  than 
the  flux  to  the  flat  areas. 


Figure  9  shows  the  Damkoehler  number  Da  versus  total  pressure  with  the  temperature 
parameterized.  Its  basic  purpose  is  to  measure  the  influence  of  transport  limitations  on  the 
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deposition  process  [13].  The  classical  definition  arises  in  the  non-dimensionalization  of  the 
flux  boundary  condition  at  the  wafer  surface  as  Da  =  (Rref£ref)/(DrefCref)',  in  the  context  of 
feature  scale  models,  a  similar  quantity  is  known  as  the  step  coverage  modulus  [14,  15].  This 
definition  can  be  used  for  a  model  on  any  length  scale,  since  no  information  about  the  surface 
structure  is  used.  More  generally,  the  purpose  of  the  Damkoehler  number  is  to  measure  the 
characteristic  deposition  rate  versus  the  characteristic  transport  rate.  We  propose  therefore 
to  take  into  account  the  information  available  about  the  surface  on  the  mesoscopic  scale 
under  consideration.  In  this  spirit,  we  define  the  following  generalized  Damkoehler  number 
for  general  three-dimensional  surfaces 


Da  = 


^Aj^Rref^ref 

■ArDrefCre  f 


(32) 


where  AD  denotes  the  characteristic  deposition  area  and  At  the  characteristic  transport 
area.  Ad  is  then  the  total  surface  area  available  for  deposition,  i.e.,  the  true  surface 
area  of  the  patterned  surface,  while  At  is  the  area  that  determines  the  transport  limita¬ 
tions,  i.e.,  the  flat  surface  area.  All  other  quantities  are  reference  quantities  arising  in  the 
non-dimensionalization  procedure  as  in  the  classical  definition.  For  the  two-dimensional 
representation  of  the  surface  used  here  (infinite  trenches),  the  ratio  AD/AT  is  equal  to 
crtotai  —  /0A  d(x)  dx,  since  this  measures  the  total  increase  in  surface  area  due  to  the  pat¬ 
terns.  Hence,  the  following  formula  gives  the  generalized  Damkoehler  number  used 

Da  =  (33) 

CrefDref 

For  low  temperatures  and  pressures,  we  expect  the  deposition  to  be  reaction  rate  limited: 
i.e.  the  Damkoehler  number  should  be  low.  On  the  other  hand,  for  high  temperatures  and 
pressures,  the  Damkoehler  number  is  expected  to  be  high  to  indicate  transport  limitations. 
Both  effects  are  exhibited  by  the  Damkoehler  number  in  Figure  9. 


Furthermore,  comparing  the  different  surface  patterns,  we  note  that  the  cross-over  to 
the  transport  limiting  conditions  (Da  «  1)  occurs  earlier  for  more  patterned  wafer  surfaces. 
This  is  due  to  the  more  difficult  transport  into  the  feature  clusters  in  these  cases.  This  is 
an  important  consideration  when  comparing  models  on  the  scale  of  feature  clusters,  and  the 
classical  definition  of  the  Damkoehler  number  does  not  capture  this  effect,  since  its  value 
would  be  the  same  for  all  surfaces. 


The  increase  in  temperature  affects  the  reaction  rate  term  in  the  definition  of  Da  in  an 
exponential  way,  while  the  diffusivity  contains  the  term  T3/2;  the  influence  of  the  exponential 
function  is  dominant  for  the  values  of  the  temperature  considered  here,  thus  the  exponential 
increase  with  respect  to  temperature  seen  in  Figure  9.  On  the  other  hand,  the  change  in  total 
pressure  only  impacts  the  diffusivity,  since  all  other  terms  (including  c>e/)  depend  only  on 
the  partial  pressure  of  TEOS.  The  diffusivity  is  inversely  proportional  to  the  total  pressure 
(see  equation  (19)),  hence  the  Damkoehler  number  goes  linearly  with  total  pressure  as  shown 
in  Figure  9. 

Figure  10  shows  a  plot  of  the  effectiveness  factor  r)  versus  the  Damkoehler  number  Da.  ?? 
is  defined  following  [13]  as  the  ratio  of  the  observed  deposition  rate  over  the  deposition  rate 
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in  the  absence  of  gradients.  The  latter  is  taken  from  the  reaction  rate  model  as  a  function 
of  the  reference  quantities,  only,  i.e.,  without  considering  the  geometry  of  the  surface.  Thus, 
r i  is  another  measure  of  how  much  transport  limitations  affect  the  deposition  process.  It  is 
desirable  to  have  this  measure  be  roughly  independent  of  the  specific  surface  structure.  In 
other  words,  we  would  like  to  have  just  one  curve  in  this  graph  encompassing  all  surface 
structures.  Notice  that  this  goal  is  approached  using  the  generalized  Damkoehler  number  as 
defined  in  this  paper;  if  the  classical  Damkoehler  number  were  used,  the  plot  would  split  up 
into  three  branches  at  a  lower  value  of  Da. 

An  Arrhenius  plot  is  included  in  Figure  11.  It  reflects  the  exponential  model  used  for  the 
reaction  rate,  since  it  is  linearly  decreasing  for  low  pressures  for  all  temperatures  and  for  all 
pressures,  if  the  temperature  is  sufficiently  low.  For  higher  pressures  and  high  temperatures, 
the  reaction  rate  per  total  area  is  again  seen  to  be  transport  limited.  Again,  we  observe  that 
the  deposition  rate  per  total  area  is  higher  for  the  flat  wafer  and  lower  for  the  most  highly 
patterned  case  compared  to  the  case  with  four  clusters  of  500  features  each.  Note  that  for 
the  reaction  controlled  regime  at  low  temperatures,  these  differences  become  insignificant, 
thus  demonstrating  that  transport  limitations  play  a  crucial  role  in  the  high  temperature 
cases  (see  also  Figure  7). 

In  summary,  this  study  of  a  representative  operating  window  demonstrates  the  capabili¬ 
ties  of  the  mesoscopic  model.  This  section  highlights  the  general  reason  for  using  simulations 
in  that  any  quantity  can  be  analyzed,  whether  it  would  be  actually  accessible  in  a  real  re¬ 
actor  or  not.  Also,  operating  conditions  could  be  chosen  outside  the  normal,  safe  operating- 
range  of  the  reactor;  however,  it  became  clear  in  this  project  that  also  simulations  will  suffer 
in  such  cases,  since  the  underlying  physical  estimation  procedures  can  become  unreliable  if 
used  outside  their  range  of  validity  and  better  procedures  are  rarely  available. 


CONCLUSIONS 


A  mesoscopic  scale  model  on  the  scale  of  several  feature  clusters  has  been  introduced. 
Its  mathematical  background  has  been  highlighted.  A  physical  example  chemistry  has  then 
been  used  to  demonstrate  the  power  of  the  model  for  analyzing  feature-to-feature  effects  as 
well  as  to  study  an  extensive  window  of  operating  conditions.  In  its  stand-alone  mode,  the 
model  has  been  shown  to  yield  appropriate  results. 

A  generalized  Damkoehler  number  has  been  introduced.  It  was  demonstrated  that  this 
number  characterizes  the  influence  of  transport  limitations  appropriately  while  using  avail¬ 
able  information  about  the  surface  patterns. 

This  paper  demonstrates  the  principal  capabilities  of  the  method  with  its  implementation 
used  in  stand-alone  mode  for  a  single-species  chemistry.  It  is  planned  to  link  the  implemen¬ 
tation  with  existing  reactor  scale  and  feature  scale  codes.  We  also  plan  to  demonstrate  the 
capabilities  of  the  stand-alone  implementation  for  multiomponent  chemistries,  as  well  as  for 
etch  processes. 
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Symbol 

name 

units 

A; 

binary  diffusivity  coefficient 

cm2  /  s 

T 

temperature 

K 

P total 

total  pressure 

torr 

collision  integral 

1 

A 

reduced  temperature 

1 

Ci 

Lennard-Jones  parameter  of  species  i 

A 

e 

K. 

Lennard-Jones  parameter  of  species  i 

K 

rrii 

molecular  weight  of  species  i 

g  /  mol 

Vi 

molar  volume  of  species  i 

cm3  /  mol 

Ti 

normal  boiling  point  of  species  i 

K 

Pi 

mass  density  of  species  i 

g  /  cm3 

Table  1:  Variables  used  in  the  diffusivity  estimation  following  Reid  [9]. 
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Quantity  [units] 

TEOS 

Oxygen 

molar  weight  ra*  [g  /  mol] 

208.3083 

31.999 

liquid  molar  volume  V;  [cm3  /  mol] 

222.7896 

27.8494 

normal  boiling  point  Tt  [K] 

441 

90.2 

mass  density  pi  [g  /  cm3] 

0.935 

1.149 

Lennard-Jones  parameter  crt  [A] 

7.1534 

3.5767 

Lennard- Jones  parameter  ^  [K] 

507.15 

103.73 

Table  2:  Material  constants  for  the  TEOS-oxygen  system.  The  normal  boiling  point  for  both 
species  is  taken  at  1  atm. 
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RSM 


MSM 


FSM 


Figure  1:  Schematic  of  the  reactor  with  flow  pattern  and  typical  domains  of  the  models. 
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Figure  3:  Concentration  profile  for  four  identical  feature  clusters  with  500  features  each. 
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Figure  4:  Concentration  profile  for  four  differing  feature  clusters,  two  with  1000  features  and 
two  with  500  features. 
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0.08 


Figure  5:  Comparison  of  concentration  profiles.  The  solid  lines  depict  the  results  for  four 
feature  clusters  with  500  features  each.  The  dashed  lines  mark  the  results  for  two  clusters 
with  1000  features  and  two  with  500  features. 
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Figure  8:  Fractional  difference  in  the  flux  versus  total  pressure  with  temperature  parame¬ 
terized.  Partial  pressure  of  TEOS  Pteos  =  0.20  torr  fixed. 
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Abstract 

This  dissertation  presents  a  problem  arising  from  the  simulation  of  gas  flow  over  mi- 
crostructured  surfaces.  For  the  industrial  application  under  consideration,  the  problem 
is  appropriately  given  as  a  time-dependent  nonlinear  reaction-diffusion  equation  on  a  do¬ 
main,  which  includes  a  flux  condition  on  a  boundary  surface  consisting  of  a  microscopic 
fine  structure.  An  equivalent  problem  for  the  bulk  solution  is  derived,  which  incor¬ 
porates  all  physical  quantities  of  interest  and  which  is  accessible  to  efficient  numerical 
simulation  at  the  same  time.  This  is  achieved  by  applying  a  homogenization  technique 
to  the  boundary  condition,  which  eliminates  the  microscopic  scale  while  retaining  its 
effect  on  the  bulk  solution.  The  derivation  presented  in  this  dissertation  is  valid  for  a 
three-dimensional  domain  and  a  general  boundary  surface  given  in  parameterized  form. 

The  underlying  application  area  in  semiconductor  manufacturing  is  the  modeling 
of  chemical  vapor  deposition  in  single  wafer  reactors.  To  the  end  of  a  global  process 
model,  this  work  introduces  a  mesoscopic  scale  model  intermediate  in  length  scale  to 
the  established  reactor  scale  and  feature  scale  models.  The  numerical  simulation  of  this 
model  is  made  possible  by  the  homogenization  technique  above.  The  model  is  validated 
for  a  mathematical  test  problem  by  comparison  to  a  classical  numerical  solution  with 
full  resolution  of  the  fine  structure.  Furthermore,  two  studies  on  the  physical  example 
of  thermally  induced  deposition  of  silicon  dioxide  from  tetraethoxysilane  (TEOS)  are 
presented.  The  first  study  analyzes  the  effect  of  varying  feature  density,  while  the 
second  one  studies  the  influence  of  varying  operating  conditions  on  important  physical 
parameters.  A  generalized  Damkoehler  number,  which  characterizes  the  amount  of 
transport  limitations  in  a  way  appropriate  for  the  mesoscopic  scale  model,  is  introduced. 
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PROGRAMMED  RATE  CHEMICAL  VAPOR  DEPOSITION  OF 
BLANKET  TUNGSTEN  THIN  FILMS 

Kathryn  M.  Tracy,  Srikanth  Bolnedi  and  Timothy  S.  Cale 

Department  of  Chemical,  Bio  &  Materials  Engineering  and  Center  for  Solid  State 
Electronics  Research,  Arizona  State  University,  Tempe  AZ  85287 

Executive  Summary 

Conventional,  constant  rate  chemical  vapor  deposition  (CRCVD)  processes 
maintain  constant  pressure  and  temperature  during  processing.  Programmed  Rate  CVD 
(PRCVD)  has  been  proposed  as  a  means  of  increasing  wafer  throughput  relative  to 
CRCVD  processes,  while  maintaining  step  coverage  and  other  critical  film  properties  at 
acceptable  values  [1,2].  For  PRCVD,  the  deposition  rate  is  decreased  as  feature  aspect 
ratio  increases  during  feature  fill  in  order  to  maintain  a  specified  step  coverage.  The 
deposition  rate  is  higher  than  the  CRCVD  rate  at  the  beginning  of  film  deposition,  and  is 
reduced  as  the  aspect  ratio  increases  towards  the  end  of  feature  fill.  The  total  time  required 
to  fill  features  with  good  step  coverage  is  less  than  the  time  required  by  CRCVD  processes 
because  the  average  deposition  rate  is  higher.  We  present  early  results  in  our  experimental 
efforts  to  demonstrate  PRCVD  of  tungsten. 

Extended  Abstract 

The  PRCVD  process  we  use  for  blanket  tungsten  thin  film  deposition  follows  the 
protocol  described  by  Cale  et  al  .[1,2];  i.e.,  we  decrease  the  temperature  during  deposition 
in  order  to  decrease  the  deposition  rate.  Unpattemed  silicon  (100)  wafers  (125mm 
diameter,  p-type)  were  used  to  study  film  characteristics  such  as  thickness,  density,  grain 
size  distribution,  composition,  and  stress.  All  wafers  were  cleaned  in  1%  HF  for  3 
minutes  prior  to  deposition.  Blanket  tungsten  films  were  deposited  in  a  Spectrum  202, 
cold  wall,  single  wafer  CVD  reactor  with  a  radiant  heat  source,  using  the  hydrogen 
reduction  of  tungsten  hexafluoride.  Table  1  lists  the  conditions  used  for  the  first  screening 
experiment,  labeled  (I),  and  Figure  1  shows  the  temperature  trajectory  . 

Film  thickness  was  determined  for  all  samples  by  Scanning  Electron  Microscope 
(SEM).  Film  thicknesses  were  about  5  microns  (Figure  2).  Sheet  resistance  was  measured 
using  a  four-point  probe,  and  film  resistivites  were  determined.  The  resistivities  of  the 
films  (Figure  3)  were  relatively  high  compared  to  CRCVD  samples,  but  within  published 
values.  Film  adhesion  to  the  silicon  substrates  was  good. 

Rutherford  Backscattering  Spectroscopy  (RBS)  was  used  to  determine  the  film 
density  and  composition.  The  film  thickness  was  first  determined  by  the  SEM,  then  RBS 
was  performed.  The  density  of  sample  1.3  was  16.1g/cm^,  compared  to  the  bulk  value  of 

19.2g/cm^.  The  RBS  spectrum  for  this  sample  (Figure  4)  showed  a  carbon  peak  at  the  W- 
Si  interface,  probably  due  to  contamination  during  the  wafer  cleaning  procedure  prior  to 
film  deposition.  We  were  unable  to  detect  oxygen  or  fluorine  in  this  relatively  thick 
tungsten  film  sample  due  to  sensitivity  impairment. 

Auger  Electron  Spectroscopy  (AES)  was  used  to  study  the  film  composition  of 
sample  1.4,  on  the  surface  and  as  a  function  of  depth.  The  AES  spectra  for  the  surface 
probe  (not  shown)  indicated  that  the  surface  of  the  film  had  oxidized,  as  expected.  Carbon 
was  detected  on  the  surface,  possibly  due  to  sample  handling  prior  to  the  technique.  The 


AES  depth  profile  (Figure  5)  was  limited  to  100  nm  for  practical  reasons,  as  the  sputter  rate 
was  slow  compared  to  the  sample  film  thickness.  AES  did  not  show  a  significant  oxygen 
content  at  the  depth  tested. 

The  microstructure  of  sample  1.2  was  studied  using  Transmission  Electron 
Microscopy.  A  TEM  cross  section  indicated  that  the  grain  size  varied  from  50  nm  for  the 
nucleation  layer  at  the  W-Si  interface,  to  0.5- 1.0  micron  at  intermediate  distances  to  the  film 
surface.  A  non-uniform  layer  of  tungsten  silicide  was  found  at  the  W-Si  interface.  Voids 
between  the  columnar  tungsten  grains  were  observed. 

Residual  stress  measurements  were  performed  on  samples  from  a  second 
experiment,  labeled  (II).  Stress  results  were  compared  for  films  grown  using  the  original 
PRCVD  temperature  trajectory  (starting  temperature  650°C),  PRCVD  with  a  reduced 
starting  temperature  (450°C),  and  CRCVD  with  the  same  temperature  as  the  end-of- 
deposition  temperature  for  both  PRCVD  samples  (360°C).  In  all  cases,  the  total  pressure 
was  1.6  Torr,  H2  flow  rate  was  500  seem,  and  WF6  flow  rate  was  50  seem.  The  high 
temperatue  PRCVD  film  (II.2)  had  significantly  lower  stress  (Table  2)  than  the  other  two 
samples. 

The  results  from  the  first  set  of  experiments  (I.*)  provide  information  about  the 
composition  and  structure  of  PRCVD  tungsten  films  on  silicon  substrates.  The  PRCVD 
samples  1.3, 1.4  and  1.5  were  compared  to  CRCVD  films  deposited  at  the  same  pressures, 
flow  ratios  and  flow  rates.  The  densities  of  the  PRCVD  films  obtained  to  date  are  lower, 
apparently  due  to  voids  formed  at  the  Si-W  interface.  These  voids  may  not  form  in  the 
presence  of  a  conventional  nucleation  layer,  such  as  TiN.  The  resistivities  of  our  films 
were  relatively  high,  which  may  be  due  to  smaller  average  grain  sizes,  and  the  lower  film 
densities.  The  motivation  for  continuing  to  test  this  protocol  is  apparent  when  the  time  of 
deposition  is  compared.  According  to  this  study,  the  PRCVD  process  will  improve 
throughput  by  at  least  40%  compared  to  CRCVD,  with  equivalent  step  coverage  predicted 
[1,2].  If  film  properties  and  step  coverage  are  good  when  a  nucleation  layer  is  used,  the 
PRCVD  process  would  provide  improved  throughput  and  decreased  consumables. 
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Figure  1:  Experimental  Conditions  for  Figure  2:  Stress  Measurements  for 

PRCVD  screening  runs;  P  =  0.5  Torr.  CRCVD  and  PRCVD.  P  =  1.6  Torr, 

See  Figure  1  for  Temperature  Trajectory.  H 2  =  500  seem,  WF <j  =  50  scan. 


ID  # 

H2  :WF6 

Total  Flow 

1.1 

10 

176  seem 

1.2 

10 

352  seem 

1.3 

5 

180  seem 

1.4 

10 

528  seem 

1.5 

10 

704  seem 
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ABSTRACT 


Improving  wafer  throughput  for  single-wafer,  low  pressure  chemical  vapor 
deposition  (LPCVD)  reactors  is  problematic,  since  process  changes  needed  to  achieve 
high  throughput  often  lead  to  film  property  degradation.  Conventional  LPCVD  process 
in  general  do  not  have  enough  degrees  of  freedom  to  achieve  high  throughput  and  maintain 
good  film  properties.  In  this  work,  the  film  properties  resulting  from  a  new  process 
protocol,  programmed  rate  chemical  vapor  deposition  (PRCVD),  are  investigated. 
Blanket  tungsten  (W)  film  deposition,  based  on  the  hydrogen  reduction  of  tungsten 
hexafluoride  (WF6),  is  used  as  a  test  vehicle  for  this  new  protocol.  Results  are  compared 
to  conventional,  constant  rate  CVD  (CRCVD)  thin  films  produced  with  the  same 
equipment. 

Results  from  this  study  of  blanket  tungsten  PRCVD  show  consistent,  excellent 
adhesion  to  silicon  substrates  and  deposited  on  nucleation  layers  using  the  silane 
reduction  of  WF6.  The  protocol  is  shown  to  be  viable  and  results  in  significant  time 
savings.  Step  coverage  is  shown  to  be  equivalent  to  CRCVD  samples  using  the  same 
equipment.  High  density  tungsten  thin  films  with  low  resistivity  have  been  produced 
using  the  PRCVD  process,  and  the  films  produced  show  significantly  lower  stress  than 
the  CRCVD  samples.  Finally,  SIMS  results  showed  that  the  one  PRCVD  film  analyzed 
had  significantly  lower  fluorine  concentration  compared  to  the  CRCVD  control,  indicating 
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that  PRCVD  process  may  exhibit  improved  chemical  purity.  sr  ?  "*' 
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ABSTRACT 

For  deposition  processes,  the  goal  is  to  maximize  throughput  while  maintaining 
acceptable  film  properties.  Higher  deposition  rates  in  general  lead  to  poorer  step  coverages 
due  to  reactant  depletion  caused  by  the  higher  reaction  rates,  or  due  to  depletion  along  the 
depth  of  the  features.  Using  feature  scale  simulations,  we  have  proposed  a  protocol  called 
programmed  rate  CVD  (PRCVD).  PRCVD  overcomes  increasing  aspect  ratios  as  features 
approach  closure  during  deposition.  The  initial  rate  can  be  much  higher  than  conventional  rate 
CVD  (CRCVD)  and  is  decreased  during  the  deposition. 

We  tested  the  PRCVD  concept  for  blanket  tungsten  deposition  from  tungsten 
hexafluoride  and  hydrogen,  in  a  lamp-heated,  cold-wall,  single  wafer,  LPCVD  reactor.  We 
found  significant  time  savings  (PRCVD  took  70%  less  time  than  CRCVD  processes  which  give 
the  same  step  coverage ),  resistivities  and  stresses  were  equivalent  to  or  better  than  CRCVD 
processes.  Thus,  it  can  be  concluded  that  PRCVD  is  a  viable  process  for  tungsten  deposition. 

PRCVD  PROCESS  vs  CRCVD  PROCESS 

CRCVD  maintains  constant  temperature,  flow  rates  and  pressure.  In 
order  to  improve  throughput  for  CRCVD,  the  deposition  rate  can  be 
increased;  however,  this  can  lead  to  poor  step  coverage.  This  limitation  is 
due  to  using  fixed  deposition  conditions,  although  feature  aspect  ratios 
increase  during  deposition. 

PRCVD  allows  for  programming  deposition  conditions  to  add  degrees 
of  freedom.  With  PRCVD,  deposition  rate  is  decreased  during  deposition  as 
feature  aspect  ratios  increase.  This  allows  the  initial  rate  to  be  much  higher 
than  the  average  rate,  perhaps  as  much  as  an  order  of  magnitude  higher.  A 
major  concern  in  using  the  PRCVD  process  would  be  the  effect  on  film 
properties:  uniformity,  stress,  resistivity,  adhesion,  composition.  [1,2] 

Experimental  Design 

The  reactor  we  used  is  a  Spectrum  202  Single  Wafer  Reactor.  The  deposition 
chemistry  is  hydrogen  reduction  of  tungsten  hexafluoride, 

WF6  +  3H2  ->  W  +  6HF 

according  to  the  hetrogeneous  reaction  kinetics. 
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R  =  k0-  exp( 


zE.\(  P"  '-  PwF< 

RT  1  +  KWF'PWFt 


Experiments  were  carried  out  on  unpatterned,  bare  silicon  wafers, 
unpatterned  and  patterned  50  nanometer  sputtered  titanium  nitride  silicon 
dioxide  wafers.  Wafers  were  passed  through  an  isopropyl  alcohol  clean  after 
an  HF  (50:1)  dip  prior  to  processing.  After  processing,  they  were  measured  for 
weight  gain,  resistivity,  uniformity,  stress  and  composition. 


PRCVD  Control  Construction 


Compared  to  CRCVD,  PRCVD  controls  one  or  more  critical  process 
parameters.  In  this  work,  we  controlled  the  temperature  ramp  rate,  as  seen  in 
a  typical  program  in  Figure  1 . 


Figure  1.  Temperature  Ramp,  PRCVD 

The  temperature  ramp  was  achieved  using  a  closed  loop  Eurotherm 
Controller.  Compared  to  a  constant  temperature  CRCVD,  where  the  process 
starts  and  ends  at  the  same  temperature,  the  PRCVD  process  begins  at  an 
initial  higher  temperature  and  is  ramped  downward  as  feature  closure 
increases  aspect  ratio. 

Using  this  control,  we  achieved  time  savings  from  50  -  70%,  using  the 
following  formula,  [3] 
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Time_  Saved_  %  =  {  cmd  ^  ]  x  100% 

^crcvd 

The  SEM  results  shown  here  are  for  450  °C,  which  is  compatible  with  the 
lower  temperatures  required  for  aluminum.  Our  work  has  tested  initial 

starting  temperatures  of  450°,  550°,  and  650°  C,  with  temperature  ramps 
ranging  from  -1  to  -5  °C  /  second. 


Figure  3.  SEM  of  PRCVD  results,  t  =  90  s,  Figure  4,  SEM  of  CRCVD  results,  t=465  s, 
0.5  T,  T=  450  °C  P=0.5,  T  =  360  °C 


Run 

Process 

(Pressure, 

Torr, 

Temp,°C) 

Aspect 

Ratio 

Thickness 
at  surface 
(t,micron) 

Thickness 
at  side- 
wall  (B, 
micron) 

Step 

Coverage 

(B/t) 

Deposition 

Time 

(seconds) 

1 

PRCVD 

(2,450) 

1.60 

0.12 

0.12 

1.0 

90 

2 

CRCVD 
(2, 360) 

1.58 

0.10 

0.10 

1.0 

465 

3 

PRCVD 
(0.5, 450) 

1.20 

0.16 

0.15 

0.94 

90 

4 

CRCVD 
(0.5, 360) 

1.25 

0.19 

0.19 

1.0 

465 

5 

PRCVD 
(5, 360) 

1.67 

0.29 

0.27 

0.93 

90 

6 

CRCVD 
(5, 360) 

1.36 

0.32 

0.28 

0.88 

465 

7 

PRCVD 
(2,  450) 

1.45 

0.26 

0.21 

0.81 

90 

Table  1:  Experiments  on  Patterned  TiN  Wafers 


Experimental  Results 

Secondary  Ion  Mass  Spectroscopy  (SIMS)  analysis  revealed  that 
PRCVD  shows  promise  of  lower  fluorine  incorporation  than  CRCVD. 
Residual  stress  increased  with  increasing  pressure,  as  did  weight  gain  and 
hence  density.  Higher  pressure  also  decreased  non-uniformity. 

Uniformities  are  about  the  same  for  PRCVD  and  CRCVD;  we  achieved 
step  coverages  approaching  and  equal  to  SC  =  1  (where  SC  is  the  ratio  of 
sidewall  coverage  to  surface  coverage)  with  considerable  time  savings  by 
using  PRCVD. 


Conclusions 

PRCVD  protocol  simulations  using  EVOLVE  (a  low  pressure  deposition 
simulator)  theoretically  predicted  tungsten  step  coverage  equivalent  to 
CRCVD  processes.  A  major  concern  for  using  PRCVD  would  be  to  sustain 
film  properties  at  CRCVD  levels,  with  improvement  or  at  the  very  least 
without  degradation.  Here,  we  have  shown  an  82%  time  savings,  while 
preserving  critical  wafer  parameters  of  stress,  uniformity,  resistance  and 
density.  Thus,  it  can  be  concluded  PRCVD  is  a  viable  process  for  tungsten 
deposition. 
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Abstract 

A  feed- forward  and  adaptive  feedback  control 
methodology  is  developed  and  experimentally  applied 
to  several  different  processes  commonly  used  in  the 
fabrication  of  semiconductor  integrated  circuit  de¬ 
vices.  A  circular  parallel-plate  capacitor  with  a  glass 
(oi02)  dielectric  is  manufactured  on  silicon  wafers  to 
illustrate  the  use  of  these  control  strategies  in  the 
processes  of  silicon  oxidation,  aluminum  metalliza¬ 
tion,  lithography,  and  aluminum  etching.  The  goal 
is  to  maintain  a  constant  capacitance  value  on  a  run 
to  run.  basis  regardless  of  disturbances  or  modeling 
errors  in  the  processes.  ° 


1  Introduction 

As  we  move  into  the  twenty-first  century,  the  semi- 
conductor  industry  will  require  processing  equipment 
with  the  capability  to  fabricate  devices  with  0.2  mi¬ 
cron  features.  To  manufacture  devices  of  this  size, 
it  is  necessary  to  maintain  very  tight  processing  tol¬ 
erances  that,  in  turn,  will  inevitably  require  the  ap¬ 
plication  of  suitable  control  methodologies  to  these 
processes. 

Recent  studies  have,  demonstrated  the  potential  of 
feedback  control  laws  in  maintaining  the  desired  pro¬ 
cess  characteristics  for  several  types  of  processes  en¬ 
countered  in  semiconductor  manufacturing.  These 
include  real-time  control  strategies  aiming  to  main¬ 
tain  a  constant  processing  environment  [1,  2,  3,  4,  5], 
run-to-run  strategies  aiming  to  adjust  the  process  in¬ 
puts  so  as  to  obtain  the  desired  product  specifications 
6,  7,  8]  and  monitoring/diagnostic/control  schemes 
rJ- 

In  this  study  we  adopt  a  run-to-run  feed-forward 
and  adaptive  feedback  control  strategy  and  imple¬ 
ment  it  for  the  overall  sequence  of  processing  steps 
in  the  experimental  manufacturing  of  a  parallel  plate 
capacitor.  This  processing  sequence  involves  four  ba¬ 
sic  steps  which  are  commonly  used  in  semiconductor 
manufacturing  processes,  namely,  silicon  oxidation, 
aluminum  metallization,  lithography  and  aluminum 
etch.  In  the  first  step  (silicon  oxidation),  the  glass 
dielectric  or  oxide  layer  of  the  capacitor  is  grown  on  a 
bare  silicon  wafer.  In  the  second  step  (aluminum  met¬ 


allization),  the  aluminum  is  deposited  over  the  oxide 
grown  in  the  first  step  which  is  the  material  of  the 
top  plate  of  the  capacitor.  In  the  third  step  (lithog¬ 
raphy),  the  photoresist  pattern  of  the  top  capacitor 
plate  is  formed  over  the  aluminum  layer.  In  the  fourth 
and  final  step  ("aluminum  etch),  the  aluminium  that 
is  not  protected  by  the  photoresist  is  etched  away  to 
physically  form  the  top  plate  of  the  capacitor.  After 
completing  each  of  these  steps,  a  suitable  output  of 
the  process  is  measured  in  order  to  assess  the  partial 
state  of  the  wafer  and  determine  the  control  actions 
to  be  taken.  The  objective  is  to  maintain  a  constant, 
desired  value  of  the  capacitance  which  can  be  found 
by  means. of  an  electrical  measurement  after  the  com¬ 
pletion  of  all  four  steps. 

Prom  static  electric  field  theory,  the  capacitance 
value  for  a  parallel-plate  capacitor  can  be  expressed 
as  [10] 


where  Q  is .  the  absolute  charge  on  the  capacitor 
plates,  V12  is  the  potential  difference  between  the 
two  capacitor  plates,  c  is  the  permittivity  constant 
of  the  dielectric,  S  is  the  area  of  the  capacitor  plates, 
and  d  is  the  dielectric  thickness.  The  relationship  be-’ 
tween  the  parallel  plate  area  and  the  dielectric  thick¬ 
ness  provides  us  with  the  opportunity  to  apply  feed¬ 
forward  control  to  maintain  a  constant  capacitance 
value  for  a  variable  dielectric  width  by  simply  ad¬ 
justing  the  capacitor  plate  area.  In  this  study,  feed¬ 
forward  control  is  used  in  both  the  lithography  and 
aluminum  etch  steps  of  the  capacitor  fabrication  pro¬ 
cess. 

On  the  other  hand,  the  development  of  a  simple, 
run-to-run,  adaptive  feedback  control  law  has  been 
studied  for  a  silicon  oxidation  process  to  improve  per¬ 
formance  and  compensate  for  equipment  drifts  and 
modelling  errors  [11].  In  order  to  illustrate  its  versa¬ 
tility,  we  apply  the  same  controller  design  principle, 
to  a  physically  based  model  for  the  silicon  oxidation 
step  and  an  empirical  model  for  the  aluminum  etch 
step.  .The  parameters  of  the  physically  based  model 
used  in  the  silicon  oxidation  step  are  initially  chosen 
far  from  the  actual  parameter  values  of  the  system 
to  demonstrate  the  adaptive  capability  of  the  con¬ 
troller  when  the  actual  model  parameters  are  not  well 
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Figure  1:  Typical  Feedforward/Feedback  Controller 
Configuration. 


known.  This  selection  also  serves  to  demonstrate  the 
corrective  capability  of  the  feed-forward  controllers  in 
the  lithography  and  aluminum  etching  steps.  In  both 
cases,  the  adaptive-feedback  controllers  have  the  abil¬ 
ity  to  compensate  for,  at  least  limited,  process  drifts 
while  preserving  the  performance  of  the  system. 

This  paper  is  organized  as  follows.  The  feed¬ 
forward  and  adaptive  feedback  controller  design 
methodology  is  discussed  in  Section  2.  Section  3  con¬ 
tains  a  description  of  the  fabrication  steps  of  the  ca¬ 
pacitor  and  some  details  regarding  the  control  of  these 
processes.  The  experimental  results  are  presented  in 
Section  4  while  Section  5  contains  concluding  remarks 
and  directions  for  future  research. 


2  Control  Methodology 

The  feed-forward  and  adaptive  feedback  control 
methodology  used  in  this  study  consists  of  two  sepa¬ 
rate  controllers  that  are  used  in  conjunction  with  each 
other  or  separately  depending  upon  the  application. 
A  typical  configuration  of  the  feed-forward  and  adap¬ 
tive  feedback  controllers,  relative  to  a  given  process 
step,  is  shown  in  Fig.  1  and  further  details  on  their 
design  are  discussed  below. 

2.1  Adaptive- Feedback  Controller 
Following  [11],  a  simple,  run-to-run,  adaptive  feed¬ 
back  controller  can  be  used  to  compensate  for  certain 
types  of  modeling  errors  as  well  as  process  drifts.  The 
implementation  of  this  controller  relies  on  a  paramet¬ 
ric  model  of  the  process.  The  derivation  of  such  a 
model  can  be  either  physically  based  or  empirical,  i.e., 
based  on  data  obtained  experimentally  from  the  sys¬ 
tem.  The  parameters  obtained  off-line  from  a  physi¬ 
cally  based  mathematical  model  of  the  process  are  at 
best  an  approximation,  often  requiring  ‘^fine-tuning7’ 
to  improve  the  accuracy  of  the  description.  On  the 
other  hand,  the  parameters  of  an  empirical  model 
are  typically  obtained  from  a  least-squares  fit  of  sev¬ 
eral  input-output  data  points  to  some  suitable  func¬ 
tion.  In  both  cases,  the  development  of  an  accurate 
,  model  requires  extensive  and  expensive  experimenta- 
tion.  This  can  be  partially  avoided  by  performing  the 


final  fine-tuning  of  the  model  parameters  by  means  of 
an  on-line  adaptive  scheme. 

For  example,  assuming  that  /  is  a  known,  locally 
well  behaved  function  that  describes  the  process  (i.e., 
for  some  0*,  the  input-output  pairs  (u,  z)  around  an 
operating  point  satisfy  u  =  /(x,0«.)),  the  adaptive 
feedback  controller  is  given  as 

(2) 

where  ti£  is  the  control,  0*  is  the  vector  of  the  param¬ 
eter  estimates  and  x*m  is  the  desired  output  of  the 
process.  The  subscript  n  denotes  the  station  num¬ 
ber  and  the  superscript  k  denotes  the  wafer  or  run 
number.  The  parameter  estimates  are  then  updated 
based  on  the  actual  measurements  so  as  to  achieve 
convergence  of  the  output  to  (or  near)  the  desired 
set-point  as  k  — »  oo.  From  [11]  such  an  adaptive  law 

a*+1  =  ^+7 fan -4)  (3) 

where  0£+1  is  the  updated  model  parameter  that  will 
be  used  for  the  next  wafer,  xhn  is  the  output  mea¬ 
surement  (feedback)  of  the  process,  and  7*  is  defined 
as 


where  x *  =  zr[xxT]~1  for  a  row  vector  z.  Notice 
that  to  ensure  that  such  an  adaptive  scheme  is  “well- 
behaved”  in  the  presence  of  inevitable  modeling  er¬ 
rors  and  measurement  noise,  suitable  dead-zone  and 
parameter  projection  modifications  should  be  incor¬ 
porated  in  the  adaptation  [12].  These  issues  are  not 
discussed  here  since,  due  to  the  limited  number  of 
experiments,  they  are  of  minor  consequence. 

2.2  Feed-Forward  Controller 
The  feed-forward  controller  is  used  to  determine  the 
desired  output  of  the  current  step  from  the  results; 
obtained  from  the  previous  processing  steps.  Its  ob¬ 
jective  is  to  adjust  the  desired  output  of  the  current 
step  so  as  to  partially  compensate  for  errors  occur¬ 
ring  in  any  of  the  previous  steps.  This  controller  is 
typically  a  simple  expression  which  prqvides  a  rela¬ 
tionship  between  the  output  of  upstream  processing 
steps  and  the  desired  output  of  the  current  step  or 
downstream  processing  steps,  i.e., 

*n*  =  Kxl>--->xn-l) 

where  h  is  a  known  function,  x**  is  the  desired  output 
of  the  current  processing  step  and  are 

the  output  measurements  of  the  previous  processes. 
Although  in  our  study  the  design  of  the  feed-forward 
controllers  is  rather  straightforward  (see  the  following 
section  for  details),  notice  that  in  general,  the  com¬ 
putation  of  h  may  be  quite  involved  and  require  the 
on-line  solution  of  an  optimization  problem. 
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Figure  2:  Oxidation  time  and  parameter  estimates. 
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Figure  3:  Desired  vs.  achieved  oxide  thicknesses. 
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rication  step.  An  empirical  feature  scale  model  was 
developed  to  predict  the  radial  overetching  capabil¬ 
ity  of  the  RIE  process  with  respect  to  tune.  All 
other  variables  such  as  RF  power,  gas  how  rates, 
and  pressure  were  held  constant.  The  resulting  ra¬ 
dial  overetch  model  is  parabolic,  requiring  a  second 
order  polynomial  with  three  parameters  to  describe 
[13],  These  three  parameters  were  determined  using 
a  least-square  fit  on  the  experimental  data  obtained 
off-line  from  the  RIE  system. 


4  Experimental  Results 

In  this  section,  we  discuss  the  performance  of  the 
feed-forward  and  adaptive-feedback  controllers,  ap¬ 
plied  to  the  silicon  oxidation,  lithography,  and  alu¬ 
minum  etching  steps. 

In  the  silicon  oxidation  step,  we  applied  an  adaptive 
feedback  controller  to  the  RTP  system.  The  model 
parameters  were  chosen  in  such  a  way  as  to  illustrate 
the  adaptive  capability  of  the  controller  as  well  as  to 
introduce  a  disturbance  into  the  fabrication  process 
to  show  the  feed-forward  capability  in  the  lithogra¬ 
phy  and  aluminum  etching  steps.  The  response  of 
the  oxidation  time  and  the  three  parameter  estimates 
generated  by  the  controller  are  shown  in  Figure  2. 
The  model  parameters  and  oxidation  time  converge 
rapidly  within  six  samples.  The  resulting  oxide  thick¬ 
nesses  for  each  wafer  (x)  and  the  corresponding  de¬ 
sired  value  (o)  are  shown  in  Figure  3.  After  the  ini¬ 
tial  adaptation  transient,  the  oxide  thickness  remains 
within  0.5  percent  from  the  desired  value.  The  main 
causes  of  this  variation  are  that  the  oxidation  time 
is  rounded  to  the  nearest  whole  second  and  the  mea¬ 
surement  of  the  oxide  thickness  is  accurate  to  within 
two  angstroms. 

In  the  lithography  step,  we  applied  a  feed-forward 
controller  to  adjust  the  size  of  the  resist  pattern  on 
the  top  plate  of  the  capacitor.  This  controller  aims 
to  compensate  for  small  variations  in  the  oxide  thick¬ 
ness  by  adjusting  the  focus  of  the  stepper  printer. 
Because  of  apparatus-r elated  limitations,  the  values 


Figure  4:  Desired,  corrected  and  mask  plate  areas  in 
the  lithography  step. 

of  such  an  adjustment  axe  limited  to  only  ±4%  or 
±2%  of  the  resist  pattern  area.  Therefore,  it  is  not 
possible  to  completely  correct  large  oxide  thickness 
errors  or  correct  the  plate  area  to  the  exact  size  that 
is  required.  After  this  step,  it  is  necessary  to  overetch 
in  order  to  obtain  the  exact  plate  area  desired.  To  as¬ 
sess  the  performance  of  the  controller  in  this  step,  the 
corrected  or  defocused  plate  area  (x)  the  desired  plate 
area  (o)  and  the  actual  plate  area  of  the  image  on  the 
mask  (-f )  are  shown  in  Figure  4.  Note  that  due  to 
the  limitations  of  the  apparatus,  there  is  little  to  be 
done  about  the  large  oxide  thickness  errors  occurring 
in  the  first  three  wafer  runs.  In  the  later  runs,  how¬ 
ever,  the  smaller  oxide  thicness  errors  axe  easier  to 
correct. 

In  the  aluminum  etching  step,  we  used  both  feed¬ 
forward  and  adaptive  feedback  controllers  to  reduce 
the  area  of  the  top  plate  of  the  capacitor  by  overetch¬ 
ing  the  aluminum  layer.  The  response  of  the  etch 
time  is  shown  in  Figure  5. a.  (The  model  parame¬ 
ters  remain  virtually  unchanged  during  adaptation, 
since  their  initial  guess  — obtained  by  an  off-line  least- 
squares  fit —  was  fairly  accurate.)  The  fluctuations  of 
the  etch  time  show  the  amount  of  radial  overetching 
required  to  adjust  the  plate  area.  In  the  second  pro- 
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Figure  5:  Aluminum  Etch  step:  a.  Etch  time;  b. 
Actual  and  desired  overetch. 
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Figure  6:  Capacitance  values:  a.  Calculated  vs.  de- 
sired;  b.  Measured. 


mental  application  of  this  approach,  run-to-run  con- 
rouers  were  implemented  in  almost  every  step  of  the 
complete  manufacturing  process  with  very  encour¬ 
aging  results.  In  order  to  establish  the  viability  of 
such  a  control  scheme,  further  studies  should  adrress 
the  issues  of  the  controller  behavior  under  more  ad¬ 
verse  conditions  (e.gM  “accidental"  de-tuning  of  a  pro¬ 
cess  step,  changes  in  operating  conditions,  process¬ 
ing  druts  etc.)  and  for  larger  numbers  of  processed 
waters . 


cessing  run,  the  plate  area  needed  to  be  larger  than 
the  resist  pattern,  so  the  controller  performed  a  min¬ 
imal  etch  to  form  the  capacitor  plate.  This  is  due 
to  the  fact  that  the  plate  area  can  only  be  reduced 
through  overetching  the  aluminum  layer.  The  result¬ 
ing  radial  oyeretch  measurement  for  each  wafer  (x) 
and  the  desired  radial  overetch  value  (o)  are  shown 
m  Figure  5.b.  It  should  be  mentioned  at  this  point 
that  there  is  an  inherent  performance  limitation  in 
this  step  since  the  measurement  of  the  radius  of  the 
capacitor  plate  is  only  accurate  to  within  ±0  5  mi¬ 
crons. 

Finally,  the  capacitance  value  was  calculated  from 
the  resulting  measurements  of  the  oxidation  layer  and 
the  top  plate  area  of  the  capacitor.  A  comparison  be¬ 
tween  the  calculated  capacitance  (+)  and  the  desired 
capacitance  value  (o)  for  each  wafer  run  is  shown  in 
Figure  6. a.  After  the  initial  adaptation  transient,  the 
resulting  capacitance  values  had  a  maximum  varia¬ 
tion  of  0.5  percent  from  the  desired  value.  However 
when  the  actual  capacitance  of  these  devices  was  mea¬ 
sured  the  results  were  significantly  different  from  the 
calculated  capacitance  values.  This  deviation  is  at¬ 
tributed  to  the  electric  field  fringing  effects  that  are 
neglected  in  the  derivation  of  equation  (1).  In  the  fab¬ 
ricated  device,  the  contribution  of  fringing  is  rather 
significant  by  increasing  the  effective  area  of  the  top 
capacitor  plate  and  hence  the  capacitance.  One  possi¬ 
ble  remedy  of  this  problem  is  to  “close”  the  outer  loop 
by  using  an  adaptive  feedback  controller  to  adjust  to 
tfie  dielectric  permittivity  constant  based  on  the  val¬ 
ues  of  measured  and  calculated  capacitance  This 
approach  was  not  implemented  in  this  study  because 
the  electrical  measurements  were  performed  after  the 
completion  of  all  processing  steps  for  all  the  wafers. 
Note  that  the  validity  of  this  argument  becomes  ev- 
ident  since  the  ratio  of  measured  versus  calculated 
capacitance  is  fairly  constant:  its  average  value  is 
2  08  with  standard  deviation  0.023.  Nevertheless  it 
should  be  emphasized  that  the  overall  control  scheme 
achieved  a  fairly  consistent  capacitance  value  (within 
1  percent)  despite  the  limitations  of  the  various  mea- 
surment  devices. 


5  Conclusions 

The  present  study  demonstrates  the  potential  of  us- 
mg  feed-forward  and  adaptive  feedback  run-to-run 
control  techniques  to  manufacture  semiconductor  de¬ 
vices  with  desired  characteristics.  In  the  experi- 


Acknowledgments 


xne  research  performed  on  this  project  by  P.  Crouch 
M.  Kozicki,  and  K.  Tsakalis  has  been  partially  sup-' 
ported  by  AFOSR  (ARPA)  (F49620-93-1-0062).  The 
research  performed  by  K.  Stoddard  on  this  project 

rfD^V™pletely  suPP°rt«d  by  AFOSR  ASSERT 
(ARPA)  (F49620-93-1-0524DEF). 


References 

[1]  T.  Breedijk,  T.  Edgar  and  I.  Trachtenberg,  “A  Model 
Predictive  Controller  for  Multivariable  Temperature  Con¬ 
trol  in  Rapid  Thermal  Processing,"  Proc..  1993  ACC,  San 
Francis co,  CA. 

[2]  S.A.  Norman  and  S.P.  Boyd,  “  Multivariable  Feedback 
Control  of  Semiconductor  Wafer  Temperature,  Ptoc. 
1992  ACC,  Chicago,  IL. 

[3]  C.D.  S draper,  “Real-Time  Control  of  Rapid  Thermal 
Processing  Semiconductor  Manufacturing  Equipment  ” 
Proc.  1 993  A  CC,  San  Francisco,  CAT  J:  - 

[4]  M.  Elta  et  a/.,  “Applications  of  Control  to  Semiconductor 
Manufacturing:  Reactive  Ion  Etching,"  Proc.  1993  ACC , 
San  Francisco,  CA. 

[5]  A.  Srinirasan,  C.  Batur  and  R.  VeiHette,  “Projective 
Control  Design  for  Multi-Zone  Crystal  Growth  Furnace," 
Proc.  1993  ACC ',  San  Francisco,  CA. 

[6]  R.A.  Soper,  D.A.  Mellichamp  and  D.E.  Seborg,  “An 
Adaptive  Nonlinear  Control  Strategy  for  Phocoljthogra- 
phy,  Proc.  1993  ACCy  San  Francisco,  CA. 

[7]  S.  Watts  Butler  and  J.  Stefani,  “Application  of  Predic¬ 
tor  Corrector  Control  to  Polysilicon  Gate  Etching,”  Proc. 
1993  ACCy  San  Francisco,  CA. 

[8]  E.  Sachs,  A.  Hu  and  A.  Ingolfsson,  “Run  by  Run  Pro¬ 
cess  Control:  Combining  SPC  and  Feedback  Control," 
submitted  to  IEEE  Trans.  Semicond.  Manuf Oct.  1991. 

[9]  C.J.  Spanos,  S.  Leang  and  S.  Lee,  “A  Control  and  Diag¬ 
nosis  Scheme  for  Semiconductor  Manufacturing,"  Proc. 
1993  ACCy  San  Francisco,  CA. 

[10]  D.  Cheng,  Field  and  Wave  Electrvms gnetics,  Addison- 
Wesley,  New  York,  1989. 

[11]  K.  Tsakalis  and  P.  Crouch,  “A  Simple  Adaptive  Con¬ 
troller  for  an  Oxidation  Process,"  proc.  1993  ACCy 
Chicago,  IL. 

[12]  G.C.  Goodwin  and  K.S.  Sin  Adaptive  Filtering  Prediction 
and  Control.  Prentice  Hall,  Englewood  Cliffs,  NJ,  1984. 

[13]  W.  Runyan  and  K.  Bean,  Semiconductor  Integrated  Cir¬ 
cuit  Processing  Technology.  Addison- Wesley,  New  York, 
1990. 


896 


WP-3  5:50 


Proceedings  of  the  33rd 
Conference  on  Decision  and  Control 
Lake  Buena  Vista,  FL  -  December  1 994 

Set-Membership  Estimation  for  Weakly  Nonlinear  Models: 

An  Application  to  the  Adaptive  Control  of  Semiconductor 

Manufacturing  Processes* 

Kostas  S.  Tsakalis  and  Lijuan  Song 

Arizona  State  University 
Center  for  Systems  Science  and  Engineering 
Box  877606,  Tempe,  AZ  85287-7606 
E-mail:  tsakalis@enuxsa.eas.asu.edu 


Abstract 

In  this  paper  we  consider  an  application  of  set- 
membership  concepts  to  the  parameter  estimation 
problem  for  weakly  nonlinear  models .  We  develop  a 
recursive  algorithm  that ,  given  input-output  data ,  a 
bound  on  the  measurement  noise  and  a  local  bound  on 
the  Hessian  of  the  noniinear  model  with  respect  to  the 
unknown  parameters ,  produces  a  consistent  ellipsoid 
containing  the  “ actual ”  model  parameters.  To  illus¬ 
trate  the  use  of  this  algorithm ,  we  consider  the  process 
of  oxidation  of  silicon  in  dry  oxygen  where  the  oxida¬ 
tion  time  is  determined  by  means  of  a  simple  adap¬ 
tive  controller.  In  an  effort  to  reduce  the  parametric 
uncertainty,  we  employ  an  auxiliary  set-membership 
estimator  to  update  the  set  of  parameter  constraints 
on-line  and,  thus,  avoid  unnecessary  drifts  of  the  adap¬ 
tive  controller  parameters. 

1  Introduction 

Gate  oxidation  is  one  of  the  frequently  encountered 
and  critical  process  steps  in  any  mainstream  metal 
oxide  semiconductor  integrated  circuit  fabrication. 
Briefly  described,  the  gate  oxidation  step  is  performed 
as  follows:  A  batch  of  silicon  wafers  are  introduced  in 
an  oxidation  chamber  and  are  oxidized  at  high  temper¬ 
atures  in  an  oxygen  atmosphere  for  a  predetermined 
time  period.  Upon  the  completion  of  the  processing 
step,  the  wafers  are  removed  from  the  oxidation  cham¬ 
ber,  the  produced  oxide  thickness  is  measured  and  the 
sequence  is  repeated  for  the  next  batch.  The  objec¬ 
tive  of  this  treatment  is  to  grow  a  silicon  oxide  layer 
of  prescribed  thickness  on  the  silicon  which,  in  turn, 
is  a  critical  factor  in  determining  important  device 
parameters  such  as  threshold  voltage.  Note  that  al¬ 
though  subsequent  processing  steps  can  be  performed 
in  order  to  compensate  for  small  deviations  form  the 
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desired  oxide  thickness,  the  tolerance  of  such  devia¬ 
tions  is  relatively  narrow. 

In  order  to  control  such  a  process,  a  simple  adap¬ 
tive  scheme  based  on  fixed-point  theorem  ideas  was 
proposed  in  [1]  yielding  local  exponential  convergence 
of  the  oxide  thickness  to  the  desired  set  point.  Due 
to  the  noniinear  nature  of  the  process  model,  an  im¬ 
portant  issue  arising  in  this  or  similar  applications  is 
to  ensure  that  the  parameter  estimates  remain  within 
the  region  where  the  approximate  model  of  the  process 
is  valid.  One  approach  to  alleviate  such  a  problem  is 
to  introduce  a  projection  modification  in  the  adap¬ 
tive  algorithm  so  that  the  parameter  estimates  are 
constrained  in  some  compact  set  which,  effectively, 
describes  the  parametric  uncertainty  in  the  process 
model.  In  the  selection  of  this  set  it  is  of  course  ad¬ 
vantageous  to  obtain  the  “smallest”  possible  one  so  as 
to  avoid  unnecessarily  large  perturbations  away  from 
the  nominal  point,  caused  by  the  growth  of  the  higher 
order  terms.  This,  in  turn,  motivates  the  study  of 
set-membership  estimation  methods  whose  objective 
is  precisely  to  determine  a  parametric  uncertainty  set, 
compatible  with  the  available  input-output  (I/O)  data 
and  noise  bounds.  Note  that  such  methods  can  be  ei¬ 
ther  off-line  (batch)  or  on-line  (recursive).  The  former 
can  be  useful  in  determining  a  gross  estimate  of  the 
parametric  uncertainty  set  based  on  a  limited  num¬ 
ber  of  data  while  the  latter  can  serve  a  “fine-tuning” 
purpose  as  more  data  become  available. 

The  set-membership  estimation  problem  has  re¬ 
ceived  a  great  deal  of  attention  in  the  recent  years 
[2]— [15] .  Naturally,  most  of  the  available  results  as¬ 
sume  linear-in-the-parameters  (LITP)  models  where, 
under  a  bounded  noise  assumption,  the  convexity  of 
the  parametric  uncertainty  set  allows  for  simple  and 
recursive  ellipsoidal  approximations  (e.g.,  [2,  3]).  In 
the  nonlinear  (NLITP)  case,  however,  an  exact  repre¬ 
sentation  of  the  feasible  solution  set  is  in  general  not 
simple,  since  it  may  not  be  convex.  The  use  of  sim¬ 
ply  shaped  sets,  like  axis-aligned  boxes  or  ellipsoids 
has  been  proposed  to  approximate  the  feasible  solu- 
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tion  set  [4,  7,  8].  In  [5,  6,  9],  the  characterization  the 
feasible  set  for  the  parameters  was  based  on  interval 
analysis,  which  is  suitable  to  linear  or  nonlinear  sys¬ 
tems.  These  algorithms,  however,  rely  on  the  batch 
processing  of  all  available  data  and  are  not  suitable 
for  on-line  updates  of  the  parametric  uncertainty  set. 

In  this  paper  we  consider  the  parameter  estimation 
problem  for  NLITP  models.  We  design  a  recursive 
set-membership  estimator  by  linearizing  the  model  of 
the  process  (with  respect  to  the  parameters)  while  the 
contribution  of  the  higher  order  terms  together  with 
the  possible  measurement  noise  is  viewed  as  the  ef¬ 
fective  perturbation.  Assuming  the  a  priori  knowl¬ 
edge  of  an  interval  containing  the  eigenvalues  of  the 
Hessian,  the  set  containing  the  feasible  parameters 
can  be  approximated  by  the  intersection  of  the  cur¬ 
rent  parametric  uncertainty  set  and  the  exteriors  of 
two  spheres.  Further,  using  a  two-step  optimization 
scheme,  we  obtain  a  two-half-space  intersection  con¬ 
taining  the  feasible  parameter  set,  for  which  the  usual 
ellipsoidal  approximations  are  applicable.  It  should  be 
noted  that  in  this  approach,  the  bound  on  the  effec¬ 
tive  perturbation  is  not  constant  but  it  decreases  with 
a  reduction  of  the  size  of  the  parametric  uncertainty 
set.  However,  such  a  reduction  may  not  be  possible 
if  the  interval  containing  the  eigenvalues  of  the  Hes¬ 
sian  is  large  relative  to  the  size  of  the  initial  paramet¬ 
ric  uncertainty.  Consequently,  the  applicability  of  the 
method  is  limited  to  weakly  NLITP  models,  that  is 
models  for  which,  in  the  given  range  of  operating  con¬ 
ditions  and  parametric  uncertainty,  the  contribution 
of  the  higher-order  terms  is  sufficiently  small. 

Finally,  to  illustrate  the  use  of  this  set-membership 
estimator,  we  consider  the  adaptive  control  problem  of 
an  oxidation  process.  For  this  application  and  follow¬ 
ing  the  approach  of  [16],  the  updated  parametric  un¬ 
certainty  sets  are  employed  as  adaptation  constraints, 
in  an  effort  to  reduce  unnecessary  excursions  of  the 
adaptive  controller  parameters. 


2  Set-Membership  Estimation 
for  Weakly  NLITP  Models 

Consider  the  model 

Vk  =/(*i,0.)  +  fk  (1) 

where  /  6  C2,  yk,Zi  and  e*  are  bounded  sequences, 
zk  6  X  C  R",  yk  6  y  C  R,  9.  6  M  C  Rm  is  the 
unknown  constant  parameter  vector,  M.  is  a  convex 
bounded  set  (typically  an  ellipsoid  or  an  intersection  of 
a  finite  number  of  ellipsoids)  and  e*  is  a  perturbation 
satisfying 

M<*»  (2) 

where  /it  is  an  a  priori  known,  bounded  sequence. 
Given  a  measurement  (xt,  y*),  our  objective  is  to  up¬ 
date  the  parametric  uncertainty  set  M  in  a  manner 
consistent  with  the  noise  bound  /it  while  reducing  its 
volume  as  much  as  possible.  Furthermore,  in  order  to 


preserve  the  simplicity  of  its  description,  we  restrict 
our  attention  to  the  case  where  the  updated  set  is  an 
ellipsoid  or  an  intersection  of  a  fixed  number  of  ellip¬ 
soids. 

2.1  LITP  models 

When  the  model  (1)  is  LITP,1  every  I/O  mesurement 
(*fc»yO  provides  a  constraint  for  the  feasible  parame¬ 
ters,  namely 

(3) 

where  wk  =  In  other  words,  6.  is  contained 

in  the  intersection  of  two  half-spaces, 


tffc  =  {0  €  Rm  :  |yt  -  u/j0TZi|  < /jt}  (4) 

Thus,  given  n  pairs  of  (x^y*),  the  smallest  set  guar¬ 
anteed  to  contain  0m  is  Sn  =  A  subop ti- 

mal  ellipsoid  approximation  of  Sn  can  easily  be  ob¬ 
tained  recursively  (e.g.,  see  [2]),  starting  with  an  ini¬ 
tial  guess  Eq  C  Rm  (typically,  £0  3  M)  that  contains 
the  a  priori  admissible  values  of  the  model  parame¬ 
ter  vector  9U.  After  each  successive  I/O  measurement 
is  acquired,  an  ellipsoid  is  constructed  such 
that  £jt_x  D  Hi  C  Ei  and  such  that  a  measure  of  the 
size  of  Ek  is  minimized.  Using  the  volume  of  the  set 
as  such  a  measure  and  the  notation 

Ek  =  E(Pk,Ck )  =  {9  e  Rm  :  ||(*  -  Ck) \\%k  <  1}  (5) 

where  Pi  is  a  positive  definite  symmetric  matrix,  the 
update  equations  become . 


V'c 


Pkl 

detPi 

Pk 

Ck 


-I  » 


l  +  q(4-el)  +  q'-^_ 

,-i  ..  Pk-iw*wIP^i 


1  +  qGk 


akPk\  -  q <*k 
aT 

— — —  detPk-i 
1  +  qGk 

[Pi-1  +  qwiwj]/ak 

^  ,  _D-1 

Ci-1  +  qPt-'TT&; 


where  q  is  a  scalar  parameter,  selected  to  minimize 
the  volume  of  Ek  via  (m  >  l)2 


_  f  0  _  if  0*  -  40u33  <  0 

q°p'  ~  \  max(0,  -fr+’/ff-'***)  otherwise 

_  '  *  (6) 

1TKat  is,  /(x,  f?)  =  u/T(r)0;  note  that  affine  models  can  also 
be  admitted  here  with  some  slight  modifications. 

2The  boundedness  of  Pjc.Pf"1  can  easily  be  ensured  regard¬ 
less  of  the  excitation  properties  of  Wk  with  a  small  modification 
of  the  same  equations  [16]. 
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P\  =  {m-  l)pt2Gi2 

fa  =  (2m  -  l)Gt^jt2  -  G*2  +  el2Gi 

/?3  =  m(Pt2-ejfc2)-Gi2 

Note  that  a  necessary  and  sufficient  condition  for 
Vol(Ek)<  Vol(Ek-i)  is  y?3  <  0. 


2.2  Weakly  NLITP  models 

For  NLITP  models  of  satisfying  (1)  and  (2),  it  is  pos¬ 
sible  to  employ  the  same  approach  as  in  the  linear  case 
by  simply  considering  the  contribution  of  the  higher 
order  terms  as  a  bounded  perturbation.  The  deriva¬ 
tion  of  their  bound,  however,  can  be  quite  conservative 
if  the  directionality  properties  of  the  effective  regres¬ 
sor  vector  df/dd  are  not  taken  into  account.  Thus, 
in  order  to  incorporate  some  structural  information 
regarding  the  higher  order  terms  in  the  estimation  al¬ 
gorithm  let  us  suppose  that: 

2.1  Assumption:  For  all  9  £  E0  and  all  (x*,  y*)  £ 
X  X  y,  the  Hessian  of  f  with  respect  to  9  satisfies: 


-  d8 2 


<  crl 


(7) 


Note  that  the  constants  <r,  A  can  be  evaluated  off-line 
for  a  given  set  Eq  and  all  possible  operating  points. 

Given  an  I/O  pair  (x*,  y*),  a  constraint  on  the  pos¬ 
sible  values  of  8m  is  given  by 


0,  £  Hkm  =  {0  £  :  \yk  -  f(xkJ)\  <  fik}  (8) 

In  order  to  compute  an  estimate  of  this  set,  say  Hk 
(Hk*  C  If*),  let  8k  be  a  parameter  vector  such  that 


yk  =  f(xk}8'k)  (9) 

Assuming  that  df/dB  ^  0,  such  a  point  can  be  found 
by  a  gradient  search  starting  from  the  current  estimate 
of  8 »,  which  is  at  least  locally  convergent.  (With  a 
minor  modification,  the  same  ideas  can  be  extended 
to  account  for  the  case  where  d f/d6  may  be  arbitrarily 
small,  by  finding  9k  such  that  \yk  -  f(xk,9'k) \  <  fih,j 
By  taking  a  Taylor  expansion  around  8[  we  have 

/(**.*)  =  /(**,*t)  +  !£(**,  w  -  **)+ 


(io) 

where  £*  is  a  convex  combination  of  9  and  8k.  Letting 

fe'k  —  §£(£*>#*)  and  invoking  Assumption  2.1,  we 
arrive  at  the  following  constraints  for  8m: 
8.£{8£Ek_  i: 


-e*  +  ^)||2  >1^-2^  L 
^ll(^-^  +  ^)H2<  ^  +  2^  j1 


(11) 


Thus,  using  the  notation  cak  =  8k  — 


ll/if  l!2 


/#' 


A  ** 

r .  — 


c  1  Ck  —  — 

following  estimate  of  If*. : 


II/,' II3 


4*  we  obtain  the 


^  ^  ^  0.  (£*_i  is  denoted  by  the  dashed  line.) 


Figure  2:  Typical  shape  of  Hk  when  A  <  0  <  cr.  (£*_! 
is  denoted  by  the  dashed  line.) 


Hk  =  {0  6  Ek-i  : 

'II*  -  A||ff  -  c£||2  <  Ar*}  (12) 

Depending  on  the  signs  of  A  and  <r,  the  set  If*  is 
the  intersection  of  £*_i  and  either  the  complement 
of  two  balls  or  a  ball  and  the  complement  of  another 
ball.  Needless  to  say,  in  all  cases  If*  is  a  non-convex 
set.  The  possible  shapes  of  the  set  If*  (m  =  2)  are 
illustrated  in  Fig.l  and  Fig. 2  for  the  cases  where  ji*  = 
0  and  fjk  >  0. 

Our  next  objective  is  to  find  the  smallest  ellip¬ 
soid  containing  the  intersection  £**_i  O  If*.  In  or¬ 
der  to  achieve  this  objective,  at  least  suboptimally, 
wre  first  obtain  two  half-spaces  whose  intersection  con¬ 
tains  Ek-iC\Hk  as  tightly  as  possible  and  then  use  the 
ellipsoid  update  algorithm  given  for  the  case  of  LITP 
models.  A  relatively  straightforward  implementation 
of  this  idea  is  outlined  by  the  following  algorithm  and 
illustrated  in  Fig.  3. 

Algorithm 

1.  Find  a  hyperplane  L\  =  {8  £  Rm  :  vj 8  =  ai] 


(i)Lna*Bafel  (b)  SjpiawakJ 


updates  of  the  set  Et. 
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such  that  L\  has  at  least  one  point  on  the  the  in¬ 
tersection  of  the  boundary  of  the  sphere  (r£,cj) 
and  the  boundary  of  £k-i  and  Ek-i  is  con¬ 
tained  in  the  half-space  L\  =  {$  £  Rm  :  vj 9  < 
a*}  as  tightly  as  possible. 

2.  Find  a  hyperplane  L7  parallel  to  such  that 
L7  has  at  least  one  point  on  the  the  intersection 
of  the  boundary  of  the  sphere  (r£,c£)  and  the 
boundary  of  Ek- 1  and  such  that  L\  and  L7  en¬ 
close  tightly  the  intersection  Ek- \  H  Hk 

3.  Using  L\  and  L7  in  the  equations  for  the  LITP- 
model  case,  update  the  ellipsoid  Ek- 1- 

4.  Repeat  steps  1-3  by  using  the  sphere  (r£,c£)  to 
define  L\  and  the  sphere  (r£  ,cj)  for  Ln  until  con¬ 
vergence  is  achieved  (typically,  a  couple  of  itera¬ 
tions  are  sufficient). 

An  important  problem  arising  in  the  construction 
of  the  above  algorithm  is  that  the  computation  of  the 
hyperpiane  parameters  v\,a\  should  be  as  optimal  as 
possible  under  the  constraint  that  the  the  intersection 
Ek-\  H  Hk  should  be  entirely  contained  in  the  asso¬ 
ciated  half-space.  Note  that,  although  a  suboptimal 
computation  may  result  in  a  deterioration  of  the  speed 
of  convergence  and  a  larger  set  Ek ,  a  violation  of  the 
constraint  may  exclude  feasible  points  from  Et.  Since 
the  latter  would  have  severe  consequences  in  the  in¬ 
tended  application  of  the  algorithm,  the  computations 
should  be  performed  so  as  to  guarantee  that  under  no 
circumstances  the  above  constraint  is  violated.  For 
this  purpose,  we  adopt  the  following  two-step  proce¬ 
dure  for  the  computation  of  the  parameters  defining 
the  hyperplance  L 

1.  Given  a  direction  v,  find  the  smallest  a  such  that 
Ek-i  H  Hk  is  contained  in  the  half-space  vT  (B  — 
9'k)  <  a. 

2.  Use  any  optimization  scheme  to  minimize  a  with 
respect  to  v .  (A  typical  starting  point  is  the  vec¬ 
tor  c\  —  B'k .) 


It  is  straightforward  to  show  that  the  first  step  of 
the  above  procedure  reduces  to  the  problem 

max(y  -  c)T(y  -  c) 

s.t.  yT Py  =  1  ;  P  =  Pr  >  0 

where  y,  c  £  Rm~*.  Using  a  Lagrange  multiplier 
A,  the  solution  of  this  problem  can  be  obtained  it¬ 
eratively,  via  a  constrained  Newton  algorithm,  by 
finding  the  unique  A  satisfying  (/  -f  A P)y  =  c  and 
A  <  —  l /d-min  where  dmtn  denotes  the  minimum  eigen¬ 
value  of  P.  Note  that  the  convergence  to  the  desired 
solution,  within  any  desired  tolerance,  is  guaranteed 
provided  that  the  Newton  iterations  are  constrained  in 
the  interval  (—oo,  —  l/dmin)  (since  other  local  extrema 
exist  outside  this  interval)  and  that  c  is  not  an  eigen¬ 
vector  of  P  (this  can  always  be  achieved  by  a  small 


<331 


4201 


40  30  60  70 

Figure  4:  Recursive  computation  of  the  parametric 
uncertainty  sets  for  the  silicon  oxidation  example. 


perturbation  of  c).  Finally,  regarding  the  second  step, 
although  convergence  to  local  minima  may  be  possi¬ 
ble,  the  computation  of  the  objective  function  V  via 
the  above  procedure  ensures  that  even  a  suboptimal 
computation  preserves  the  property  0.  6 


2.3  An  example  of  recursive  Set- 
Membership  estimation  for  weakly 
NLITP  models 

To  demonstrate  the  behavior  of  the  above  algorithm, 
let  us  consider  the  following  simplified  dynamic  model 
of  a  silicon  oxidation  process  [IT] 


x  = 


B 

2* +  .4 


(13) 


where  x  is  the  oxide  thickness,  and  A,  B  axe  experi¬ 
mentally  determined  constants.  The  solution  of  this 
differential  equation  is 


x2  -  Z5  A(x  —  xo) 

B  +  B 


(14) 


Letting  xq  =  TO  and  given  that  A  £  [48, 68],  5  £ 
[426,446],  x  £  [100,500],  p*  =  0.01  we  would  like 
to  recursively  estimate  a  set  containing  the  “actual” 
parameters,  .4*,  B.  based  on  the  following  measure¬ 
ments 

Table  1 


k 

1 

2 

3 

4 

5 

Zi 

438.39 

264.83 

436.60 

207.73 

266.16 

it 

461.23 

168.11 

45T.51 

101.19 

169.83 

6 

7  3 

9 

10 

314.92 

287.17  214.88 

171.33 

161.49 

238.84 

198.26  108.79 

66.19 

57.74 

(Here  we  used  Am  =  58,  3m  =  436.) 


For  this  problem,  the  algorithm  outlined  in  the 
previous  subsection,  with  the  Hessian  bounds  being 
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<j  —  0.00782  and  A  =  —0.00074,  yields  the  para¬ 
metric  uncertainty  set  estimates  shown  in  Fig.  4. 
Note  that  the  volume  of  this  set  is  reduced  rapidly 
to  a  rather  small  value  (detP0_1  =  40000,  det  P^1  = 
65.57,  detPf1  =  0.125, ...  det  =  0.125). 


3  Application  to  the  adaptive 
control  of  a  silicon  oxidation 


process 


In  this  section  we  consider  the  batch-to-batch  adaptive 
control  problem  of  a  silicon  oxidation  process  where 
the  objective  is  to  adjust  the  oxidation  time  so  that 
the  thickness  of  the  produced  oxide  converges  to  a 
desired  value,  denoted  by  xm  [1].  We  assume  that  the 
actual  process  is  governed  by  the  following  differential 
equation 


B 


2  x  ~h  A 


+  Ce~z/L 


(15) 


which  accounts  for  an  initial  reaction-dominated  stage 
[17].  However,  to  emulate  modelling  inaccuracies,  all 
the  derivations  associated  with  the  controller  design 
and  operation  are  based  on  the  simplified  model 


-  gp  A(x  -  x0) 
B  +  B 


(16) 


Following  [1],  a  simple  adaptive  scheme  can  achieve 
this  objective,  at  least  locally.  During  the  process, 
however,  it  is  likely  that  the  lack  of  sufficient  excita¬ 
tion  for  identification  purposes  may  cause  the  param¬ 
eter  estimates  to  converge  to  physically  unacceptable 
values  (e.g.,  depending  on  the  initial  conditions  A  may 
converge  to  a  negative  value).  Although  in  this  par¬ 
ticular  case,  such  an  event  does  not  affect  the  com¬ 
putation  and  convergence  of  the  control  input,  it  is 
desirable  to  implement  constraints  in  the  adaptation 
so  that  the  parameter  estimates  are  within  physically 
acceptable  bounds.3  Further,  since  the  convergence 
of  the  adaptation  is  only  local,  it  is  also  desirable  to 
restrict  the  parameter  estimates  in  a  small  compact 
set  that  is  reduced  on-line  as  new  measurements  are 
obtained. 

To  illustrate  the  the  implementation  of  these  ideas 
via  on-line  Set-Membership  estimation,  we  employ  the 
following  adaptive  controller  for  the  silicon  oxidation 
process:4 


ik  =  f(x.,8k) 

3 Such  considerations  may  become  important  in  more  gen¬ 
eral  cases  where  the  process  model  is  of  the  implicit  form 
^(x,  u,  0)  =  0;  in  such  cases,  the  ability  to  compute  an  ap¬ 
propriate  control  input  u  may  depend  critically  on  Lae  value  of 

6. 

4  This  is  a  slightly  different  (simpler)  version  of  the  algorithm 
used  in  [l] ;  both  algorithms,  however,  have  similar  qualitative 
properties  and  similar  parameter  projection  techniques  are  ap¬ 
plicable  to  both  case*. 
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Figure  5:  Simulated  response  of  the  adaptive  con¬ 
troller;  fik  =  4.2 


#1+1  =  8k^  (zk,9k)[tt  -  f{xk,8t)] 

&k+ 1  =  'PMnEknH£(Qk+i) 

where  r*  =  zT[zzT]~1  for  a  row  vector  z,  Vx  is  a 
projection  on  the  convex  set  X  [18]. 5  Further,  Ek 
denotes  the  updated  parametric  uncertainty  set  and 
h£  is  the  intersection  of  the  two  half-spaces  contain¬ 
ing  Ek  O  Hk>  Both  of  these  sets  are  computed  by  the 
Set-Membership  algorithm  of  the  previous  section,  op¬ 
erating  in  parallel  with  the  adaptive  controller. 

The  properties  of  the  above  adaptive  controller  are 
similar  to  that  in  [1].  That  is,  locally  around  (x.,0,) 

1.  6k  G  M  PiEk-u  V*. 

2.  The  tracking  error  e*  =  Xk  —  xm  converges  to 
the  residual  set  {e  :  e  <  d(a,  A, /it)}  exponen¬ 
tially  fast,  where  d  is  a  dead-zone-like  threshold 
s.t.  d  —  0  as  its  arguments  approach  zero. 

3.  In  the  ideal  case  where  fjk  =  0,  the  tracking  error 
e jt  converges  to  zero,  even  if  d  >  0. 


Applying  this  adaptive  controller  to  the  silicon  oxi¬ 
dation  example,  wTith 


/(=.*)  = 


ga-fl2(3)  +  fl(1)(«-g(3)) 
0(2) 


where  6  —  [A,  B  }Xq]t  .  The  results  of  this  example  axe 
as  shown  in  Figures  5  and  6. 

For  the  simulations,  we  used  the  follow’ing  nom¬ 
inal  values:  A  =  3854. 87exp(— 5336 . 43/T);  B  = 
2.8333£llezp(— 25754.06/T); 

C  =  202|g.3fi^erp(— 25754.06/T);  L  =  100A;  where 
T  is  in  °K,  B  in  k7/s  and  A  in  A/s.  In  all  cases. 


6 When  A  is  an  intersection  of  sets  A',,  the  projection  is 
computed  by  sequential  orthogonal  projections  on  each  A',  until 
convergence  is  achieved. 
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Figure  6:  Simulated  response  of  the  adaptive  con¬ 
troller  with  a  more  conservative  selection  of  the  per¬ 
turbation  bounds;  pt  =  10 


the  temperature  initially  follows  a  ramp  with  slope 
300°C/s  and  after  reaching  1000°C,  remains  constant 
for  the  rest  of  the  operation. 

The  desired  oxide  thickness  is  r.  =  400A  while 
the  region  of  operation  is  between  200A  to  600A.  A 
nominal  value  for  the  parameter  9  was  computed  off¬ 
line  (e.g.,  in  practice  by  processing  a  few  test  wafers) 
as  0o  =  [67.6,513.8,100.8].  This  value  was  obtained 
by  using  a  nonlinear  least  squares  fit,  with  the  con¬ 
straints:  A  £  [0,95],  3  £  [450,538],  xQ  £  [0,205].  The 
corresponding  parametric  uncertainty  set  is  0(1)  £ 
[47,71],  0(2)  €  [15,1215],  0(3)  £  [0,334].  The  bound 
of  the  measurement  noise  in  terms  of  the  oxidation 
time  is  4.2  seconds  while  a  =  0.0096  and  A  =  —0.0060. 
Finally,  for  the  initialization  of  the  adaptation  we 
selected  90  =  [30, 500, 150]T,  Eq  =  {9  £  Rm  : 
(0-co)TPo_1(0-co)  <  1}  where  c0  =  [47.5,494, 167]r 
and 

/  0.1123  x  10~3  0  0  \ 

Po  =  0  0.1860  x  10“3  0 

\  0  0  0.0139  x  10"3  J 

whose  inverse  has  determinant  ~  3.4538  x 

1012. 

As  demonstrated  by  the  simulations,  the  conver¬ 
gence  of  the  algorithm  is  fairly  rapid  while  the  pa¬ 
rameters  remain  within  physically  meaningful  bounds. 
Furthermore,  although  the  obtained  measurements  do 
not  provide  sufficient  (or  even  persistent)  excitation 
for  identification  purposes,  the  volume  of  the  para¬ 
metric  uncertainty  set  is  reduced  considerably. 
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Abstract 

In  this  note,  a  reduced  order,  physic  ally- motivated  em¬ 
pirical  model  is  proposed  and  validated  via  simulation 
for  the  single  wafer  tungsten  Low  Pressure  Chemical 
Vapor  Deposition  (LPCVD)  processing  step.  The  so- 
called  multiple  response  surface  method  is  adopted  to 
describe  the  spatial  deposition  nonuniformity  across  a 
wafer  surface.  Based  on  this  modeling  methodology, 
a  simple  adaptive  optimization  control  strategy  is  de¬ 
veloped  by  which  the  average  deposition  thickness  at 
the  wafer  surface  is  controlled  to  a  desired  level  while 
its  variation  of  the  state  across  the  wafer  surface  is 
minimized.  Simulation  results  demonstrate  the  effec¬ 
tiveness  of  the  control  strategy  and  its  potential  capa¬ 
bility  of  rejecting  disturbances  during  the  process.  In 
this  study,  a  simulation  platform  (CFDSWR)  is  used 
to  represent  the  single  wafer  tungsten  LPCVD  process. 
The  control  strategy  introduced  here  is  quite  general 
and  applicable  to  other  processing  steps  as  well. 

1  Introduction 

In  the  semiconductor  manufacturing  industry,  the  pro¬ 
cess  by  which  an  integrated  circuit  (IC)  is  fabricated 
involves  a  large  number  of  processing  steps.  Each  pro¬ 
cessing  step  involves  a  single  wafer,  or  a  batch  of  wafers 
in  a  boat,  being  placed  in  various  pieces  of  equip¬ 
ment  to  accomplish  tasks  such  as  oxidation,  reactive 
ion  etch,  chemical  vapor  deposition,  lithography,  etc. 
Many  of  these  pieces  of  equipment  are  basically  reac¬ 
tor  vessels  into  which  the  wafer  (or  wafers)  is  placed 
and  then  exposed  to  controlled  flows  of  gases  and  re¬ 
actants  that  change  the  characteristics  of  the  surface 
layers  of  the  wafer  through  either  a  chemical  or  a  physi¬ 
cal  process,  or  both.  Each  processing  step  is  controlled 
by  maintaining  the  manipulated  variables  (control  in¬ 
puts)  near  appropriately  selected  set-points,  e.g.,  tem¬ 
perature,  pressure,  gas  flows  and  processing  time. 

The  first  difficulty  associated  with  this  control  prob¬ 
lem  is  the  modeling  of  the  process.  A  considerable 

•This  work  was  supported  in  part  by  ARPA  under  grant  no. 
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amount  of  effort  has  been  devoted  to  the  mathemati¬ 
cal  modeling  of  many  individual  processes  based  on  the 
underlying  physical  and  chemical  mechanisms  [1]  [2]. 
Unfortunately,  almost  none  of  these  models  is  suitable 
for  control  purposes,  mainly  due  to  their  complexity 
and  the  large  number  of  parameters  that  must  be  de¬ 
termined.  In  an  attempt  to  alleviate  these  problems, 
polynomial  models  are  frequently  used  to  approximate 
the  static  state  response  of  a  process  step.  The  param¬ 
eters  of  the  model  are  determined  by  applying  tradi¬ 
tional  response  surface  techniques  and  linear  regres¬ 
sion  schemes  [3].  However,  the  practical  usefulness 
of  the  polynomial  model  is  limited  by  the  complexity 
often  required  to  yield  the  desired  accuracy  over  the 
range  of  operating  conditions.  In  turn,  such  models 
may  be  over-parametrized  for  a  smaller  range  of  oper¬ 
ating  conditions,  something  that  may  cause  parameter 
convergence/drift  problems  when  on-line  adaptation  is 
used  in  the  presence  of  disturbances  [4].  Although  a 
remedy  for  these  problems  can  be  found  in  the  use 
of  adaptation  dead-zones,  it  is  reasonable  to  seek  an 
empirical,  but  physically  motivated  model  whenever 
possible,  in  an  effort  to  minimize  the  number  of  ad¬ 
justable  parameters.  This  is  the  approach  adopted  in 
this  study. 

Even  though  set-points  of  the  manipulated  variables 
for  a  process  may  be  kept  constant,  process  condi¬ 
tions  and  the  associated  measurements  generally  drift 
or  fluctuate  during  the  machine  operation.  It  is  there¬ 
fore  important  to  detect  these  trends  in  the  process 
and  adjust  the  set-points  to  compensate  for  their  ef¬ 
fects.  This  is  both  a  real  time  problem,  occurring  dur¬ 
ing  the  processing  of  a  wafer  (or  batch  of  wafers)  us¬ 
ing  in-situ  measurements,  and  a  run-to-run  problem 
in  which  machine  settings  sire  adjusted  between  runs. 
To  alleviate  these  problems,  several  control  strategies 
have  been  proposed  including  statistical  process  con¬ 
trol  [5],  model  based  control  [6]  and  model  prediction 
and  correction  [7],  etc.  For  the  same  purpose,  two  of 
the  authors  (Tsakalis  and  Crouch)  have  proposed  an 
adaptive  control  scheme  [8]  which  has  the  potential  of 
addressing  this  issue  in  the  case  of  run-to-run  control. 
An  application  of  this  technique  has  been  considered  in 
the  experimental  manufacturing  process  of  a  capacitor 


1294 


02 


Reactant  gu  in 


Figure  1:  Schematic  diagram  of  the  Single  Wafer 
LPCVD  reactor 


[9],  Suitable  variations  are  concurrently  being  devel¬ 
oped  to  deal  with  real  time  issue.  As  the  wafer  size 
used  in  device  fabrication  continues  to  increase,  it  be¬ 
comes  more  difficult  to  maintain  acceptable  processing 
uniformity  across  the  wafers.  Consequently,  not  only 
must  the  average  processing  condition  on  the  wafer 
surface  be  accurately  controlled,  but  it  is  also  equally 
important  to  minimize  the  variation  of  the  processing 
conditions  across  the  wafer  surface  at  the  same  time. 
Regarding  the  latter  issue  alone,  some  experimental 
and  analytical  (or  numerical)  methods  have  been  in¬ 
vestigated  (see,  e.g.,  [10]).  These  studies  are  mainly 
concerned  with  the  optimal  design  of  the  processing 
reactor  vessel  or  equipment  operation  set  points,  or 
both.  More  recently  some  control  problems  has  been 
considered  for  Rapid  Thermal  Processing  (RTP)  pro¬ 
cesses  where  the  distribution  of  power  to  the  heater 
lamps  was  controlled  so  as  to  achieve  the  most  uni¬ 
form  temperature  distribution  possible  on  the  wafer 
surface  and,  hence,  indirectly  improve  the  thickness 
uniformity  [7]  [11].  In  general,  little  effort  has  been 
devoted  to  the  combined  issue  of  accurately  control¬ 
ling  the  average  processing  conditions  while  minimiz¬ 
ing  their  variation.  These  are  precisely  the  main  ob¬ 
jectives  of  the  present  study. 

2  Model  development  for  single 
wafer  tungsten  LPCVD 

Chemical  Vapor  Deposition  (CVD)  is  one  of  the  impor¬ 
tant  processing  steps  in  IC  fabrication;  it  uses  volatile 
compounds,  containing  the  species  to  be  deposited,  to 
form  a  thin  solid  film  on  a  solid  surface  such  as  a  silicon 
wafer.  Single  wafer  Low  Pressure  CVD  (LPCVD)  is 
becoming  more  competitive  compared  with  its  multi¬ 
wafer  counterpart  due  to  its  great  potential  in  improv¬ 
ing  wafer- to- wafer  deposition  uniformity.  A  schematic 
diagram  of  a  typical  single  wafer  LPCVD  reactor  is 
illustrated  in  Fig.l. 

In  the  tungsten  LPCVD  process,  a  thin  layer  of  solid 
tungsten  is  to  be  deposited  on  a  wafer  surface,  with 
tungsten  hexafluoride  (WF6)  and  hydrogen  (. H2 )  as 
the  reactant  gases.  The  chemistry  governing  the  pro¬ 
cess  is 

WF6  +  ZH2 — ►  W  +  6HF 


Figure  2:  Relative  error  between  simulation  and  pre¬ 
diction  data  of  deposition  rate 

The  manipulated  variables  (control  inputs)  involved 
in  the  processing  step  are  the  susceptor  temperature 
T  ( K)%  total  pressure  P  ( torr )  and  flow  rates  of  WF$ , 
H2  and  inert  carrier  gas  (accs).  The  output  vari¬ 
ables  (controlled  outputs)  are  deposition  thickness  (or 
growth  rate)  and  its  spatial  nonuniformity  across  the 
wafer  surface.  For  control  purposes,  the  modeling  of 
such  a  process  aims  to  establish  relationships  between 
the  manipulated  and  output  variables.  Before  any  can¬ 
didate  models  are  proposed,  experimental  design  and 
statistical  analysis  are  usually  helpful  in  determining  a 
suitable  model  structure.  In  this  note,  we  discuss  the 
use  of  a  simulation  test  bed  rather  than  an  actual  reac¬ 
tor  for  model  development.  Simulation  data  are  gener¬ 
ated  using  a  simulation  platform  called  CFDSWR  [16] 
[17],  that  simulates  an  LPCVD  reactor.  A  full  facto¬ 
rial  experiment  design  was  adopted  and  Yate’s  analysis 
[12]  was  applied  to  the  simulation  data  in  order  to  de¬ 
termine  the  most  significant  relationships  between  the 
manipulated  variables  and  the  output  variables.  The 
results  show  that  the  susceptor  temperature  T  and  the 
total  pressure  P  are  the  two  dominant  factors  which 
affect  both  deposition  growth  rate  and  its  nonunifor¬ 
mity.  Based  on  a  physical  analysis  given  by  [13]  and 
in  combination  with  the  above  statistical  analysis  re¬ 
sults,  we  developed  a  reduced-order  model  relating  the 
tungsten  deposition  growth  rate  ( GR )  with  the  sus¬ 
ceptor  temperature  and  the  total  pressure  at  a  certain 
point  on  the  wafer  surface.  This  reduced-order  model 
is  given  as 

c*=*«‘“'/T)nS  '  « 

where  Kq  is  a  constant,  6\,  92  and  #3  are  unknown 
model  parameters  determined  by  fitting  the  model 
to  the  simulation  data  (using,  for  example,  a  Gauss- 
Newton  nonlinear  regression  algorithm  [14]). 

Modeling  the  spatial  nonuniformity  normally 
presents  a  challenge  since  the  underlying  map  from 
the  manipulated  variables  to  the  non  uniformity  is  in 
general  very  complicated.  A  possible  approach  is  to 
include  spatial  information  into  the  model  parameters 
introduced  above  so  that  the  model  may  be  capable 
of  “reflecting”  the  spatial  variations  of  the  deposition 
and  hence  the  nonuniformity  can  be  defined  based  on 
the  model.  On  the  other  hand,  the  so-called  multi¬ 
ple  surface  response  method,  developed  by  [15],  suc¬ 
cessfully  circumvents  the  difficulty  by  using  a  simpler 


1295 


strategy.  Since  in  most  cases,  the  spatial  nonunifor¬ 
mity  is  not  measured  directly  but  is  calculated  from 
measurements  of  other  output  characteristics  at  spec¬ 
ified  locations,  it  may  be  more  efficient  to  fit  the  mea¬ 
surements  first  and  then  construct  the  model  for  the 
spatial  nonuniformity.  This  is  different  from  the  tra¬ 
ditional  approach  of  first  calculating  the  value  of  the 
nonuniformity  based  on  the  measurements  and  then 
fitting  the  model  parameters  accordingly.  For  exam¬ 
ple,  given  N  deposition  thickness  measurements  at  dif¬ 
ferent  points  along  the  wafer  radius  and  assuming  the 
processing  conditions  have  a  circular  symmetry,  a  suit¬ 
able  measure  of  the  deposition  nonuniformity  is  the 
standard  deviation  divided  by  the  average  of  these 
thickness  measurements.  Fig.2  shows  the  error  be¬ 
tween  the  simulation  data  and  the  model  predicted 
ones.  The  proposed  model  is  able  to  accurately  predict 
the  deposition  growth  rate  over  a  fairly  broad  range  of 
both  temperature  and  pressure. 


3  Adaptive  Control  of  the  De¬ 
position  Process 

Let  us  assume  that  a  processing  step  can  be  approxi- 
mated  by  a  model 

*  =  /(M)  (2) 

where  x  denotes  the  predicted  state  (vector)  starting 
from  an  initial  state  x0  after  a  certain  amount  of  pro- 
cessing  time,  u  stands  for  the  set  point  of  the  manip¬ 
ulated  variables.  Note  that  the  processing  time  can 
also  be  considered  as  a  control  input,  included  in  u. 
Finally,  9  denotes  a  vector  of  unknown,  adjustable  pa¬ 
rameters.  We  assume  that  there  exists  a  vector  9 * 
for  which  (2)  is  a  “good”  approximation  of  the  actual 
process;  i.e.,  x  =  /(u,  9m),  x  is  a  measured  state  which 
is  available  from  the  processing  step.  Our  first  objec¬ 
tive  is  to  control  the  state  x  to  a  desired  level  xm  so 
that  a  certain  prescribed  value  y*  depending  on  x *  is 
achieved.  That  is, 

y*  =$(*•)  =y(u,(H  (3) 

The  second  control  objective  can  be  described  as 

min  J(x,  u)  =  min  h(u,  8")  (4) 

where  J  denotes  any  suitable  cost  functional,  defined 
in  terms  of  the  state  and  control  input.  If  9'  were 
known  then  (3)  and  (4)  could  be  used  to  determine 
desirable,  locally  unique,  values  of  the  control  inputs. 
However,  in  actual  control  problems,  the  value  9’  is 
partially  unknown.  Hence,  we  seek  an  iterative  adap¬ 
tive  control  algorithm  enabling  the  computation  of 
updates  of  the  parameter  and  control  vectors,  while 
asymptotically  achieving  (at  least  approximately)  the 
above  two  objectives.  Conceptually,  the  adaptation 
must  be  driven  by  a  computable  error  ( x-x ),  together 
with  the  optimization  problem 

mmh(u,  9)  (5) 

s.t.g(u,9)-y'  =  G(u,9)  =  0  (6) 


T>  generate  a  candidate  algorithm,  we  re-cast  the 
problem  as  the  solution  of  an  underdetermined  system 
of  nonlinear  equations.  We  let  A  denote  the  Lagrange 
multiplier  and  set  6 

du  ' 

control  problem  may  be  reformulated  as  one 
of  nndmg  solutions  of  the  following  system  of  nonlinear 
equations 

f(u,9m)  =  0 

H(u,9,X)  =  0  (g) 

G(u,9)=  0 

with  respect  to  the  vectors  u  and  A.  This  is  ba¬ 
sically  the  approach  adopted  in  the  paper  by  two  of 
the  authors  (Tsakalis  and  Crouch)  [8].  The  idea  em- 
ployed  there  is  to  generate  run-to-run  adaptive  control 
algorithm  by  solving  the  equations  iteratively  using  a 
generalized  form  of  Newton’s  algorithm.  Although  the 
same  method  may  also  be  applied  in  our  case,  the  pos¬ 
sibly  high  sensitivity  of  um  (optimal  solution  of  u)  on 
the  parameters  9  and  the  coupling  of  the  parameter 
and  control  update  rules,  often  results  in  demanding 
unacceptably  large  variations  of  the  control  input.  To 
circumvent  this  problem,  one  could  employ  parameter 
projection  techniques  [4,  18],  possibly  combined  with 
modifications  of  the  cost  functional  so  as  to  penalize 
exessive  control  inputs. 

In  this  study,  however,  we  adopt  a  simpler  alter¬ 
native  approach  (motivated  by  the  indirect  adaptive 
control  paradigm)  in  which  the  parameter  update  and 
control  update  are  decoupled.  That  is,  we  first  com¬ 
pute  the  parameter  update  by  applying  a  sequence  of 
nonlinear  regression  steps  to  the  first  components  of 
the  equation  (8) 

t>(t  + 1)  =  S(t)  +  _  „(t))  (9) 

until  |(£(i)  -  x(k))/x(k)\  <  e  where  * 

the  pseudo  inverse  of  m d  then  update  the 

control  u(k  -f*  1)  using  (5)  and  (6)  with  9  replaced  by 
+  1)*  In  general,  we  cannot  expect  9(k)  and  tt(jfc) 
to  converge  to  9*  and  u*.  It  is  therefore  unreasonable 
t°.  expect  that  the  optimal  value  of  h(um,9m)  of  the 
minimization  problem  is  achieved  asymptotically,  even 
in  the  ideal  case  of  exact  models.  However,  in  view 
of  the  available  indirect  adaptive  control  results,  we 
may  argue  that  such  an  adaptive  control  algorithm 
converges  to  a  residual  set  (Vi  >  i0)  where 

*(*)  =  /(«(*),  *(*))  =  /(«*,  o  =  *• 

*(*(*))  =  $00  =  yu  (10) 

lA<>(*),0(*))  -  h(umt9m)\  <  <r 

In  our  studies  we  used  the  threshold  value  e  =  0.001 
that  resulted  in  <r  <  0.05.  It  should  be  mentioned, 
however,  that  the  dependence  of  <7  on  the  adapta- 
tion  parameters  and  the  range  of  operating  conditions 
is  rather  complicated;  and  although  this  dependence 
could  be  determined  via  a  tedious  worst-case  analysis, 
such  a  direction  was  not  pursued  in  our  study. 
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Figure  3:  Adaptation  of  the  model  parameters  during 
the  processes  ® 

t 

4  Simulations  and  Discussion 

To  test  the  proposed  strategy  for  controlling  single 
wafer  tungsten  LPCVD  processing  step,  the  CFDSWR 
is  used  to  represent  the  actual  process.  The  susceptor 
temperature  (T)  and  total  pressure  (P)  are  selected 
as  the  two  manipulated  variables  while  the  other  three 

^antb  ThefloVa^  ^  r^taat  «ases)  are  kept  con- 
stant.  The  output  variables  are  deposition  thickness 

and  its  spatial  nonuniformity.  We  assume  that  the 

thiclmess  measurements  denoted  by  zf  are  measured 

at  different  points  along  the  wafer  radius  in  five-second 

time  intervals  Four  measurement  points  are  obtained 

tbe  sunalation i  with  more  samples  near  the  edge 

ieatosWtafWWhiere  hC  VariaLtion  “  Process  conditions  is 
greatest.  We  also  assume  that  the  instantaneous  depo¬ 
sition  rate  measurements  at  each  point  x<  ( A/ min )  are 
not  available,  but  can  be  approximated  by  subtiS 
mg  the  current  measured  thickness  by  the  previously 
measured  one  and  dividing  by  the  time  differences  (5 

Our  first  objective  a  to  deposit  a  desired  spatial  av¬ 
erage  thickness  (8 10 A)  of  solid  tungsten  on  the  wafer 
surface  m  a  given  period  of  time  (90  sec).  We  also  want 

fm.r^h  •1ITZe  the  deposition  nonuniformity  based  on 
four  thickness  measurements.  Here,  we  consider  an  op¬ 
erating  range  of  T  €  [650, 730 }(K)  and  P  6  [1  0  3  of 

M  5  0  8Wow?  f  WF"  H ? “d  & 

mg  [U. 25,  0.0, 8.0]  (sees)  respectivly.  The  first  obiec- 

tive  can  be  met  by  following  one  of  two  approSS 
The  first  is  to  control  the  average  deposiX  rate  to 
a  desired  value.  The  other  is  a  quasi- real-time  strat¬ 
egy  in  which  we  first  pre-define  a  reference  thickneS 
trajectory  based  on  the  first  objective  and  then  trv 
to  control  the  average  thickness  growth  to  follow  the 

^renc,e  tr.ajf^°ry-  our  study>  the  second  strat¬ 
egy  is  adopted  due  to  its  potential  capability  of  reject- 

SnffPv*"”  and  drift, iduring  Processing.  A  con- 
nt  thickness  increment  (per  five  seconds)  trajectory 

Ufdfmikthe  simulation.  Note  that  we  do  not  irJ 

k  to  ha“t  COnStantL  thickness  increment  trajectory 
is  the  “best  one  with  respect  to  the  final  deposition 
nonuniformity.  In  fact  determining  an  optimal  refer¬ 
ence  trajectory  is  an  interesting  problem  in  itself  which 
is  left  as  a  topic  of  future  research. 


Figure  4:  Control  inputs:  Temperature  (T)  and  Pres- 
sure  (P)  v  J 
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Figure  5:  Thickness  Trajectory  Zaac  (A)  and  the 
respondmg  Growth  rate  GR  (A/min.) 


cor- 


Droblern  S™uJation.’  the  constrained  optimization 
problem  was  solved  using  standard  numerical  opti¬ 
mization  algorithms  (MATLAB).  The  simulation  Re¬ 
sults  are  shown  in  Fig.  3-7.  Fig.  3  shows  the  adapta- 

to°the  fi^638  °f  thC  m°del  paranieter8  corresponding 
to  the  first  measurement  point.  The  set-point  changes 

of  the  control  inputs  (Fig.  4)  indicate  that,  in  order  to 
minimize  the  nonuniformity,  we  must  decrease  the  prS 

h^to  PeratUie  33  much  38  possible-  On  the  other  : 
hand,  the  prwsure  must  be  increased  so  as  to  main- 
tarn  suitable  deposition  rate.  Fig.  5  shows  the  average 
i_^C  S^°wt^^traject°ry  Zavt  and  the  correspond¬ 
ing  growth  rate  GR  generated  from  the  CFDSWR  at 
intervals.  The  actual  thickness  nonuni- 
rmity  defined  as  the  standard  deviation  divided  by 
the  average  is  shown  in  Fig.  6.  J 

h^nr^  th?  CapabUity  o{  Meeting  process  distur- 
baafl’  introduce  a  step  change  increase  of  hydro 
gen  flow  rate  at  30  seconds.  After  the  change,  the  sus- 

tufb^rpemP^ratUrt|deCrease3  “  response  to  the  dis- 
turbance  and  quickly  converges  to  a  new  level  min- 

imizing  spatial  nonuniformity  subject  to  the  desired 

average  growth  rate  constraint.  On  the  other  hand, 

U,nPtoT7e  st'U  remams  the  same  in  order  to  main- 
tam  the  desired  growth  rate  (Fig.  4).  This  behavior 
simply  indicates  that  for  the  given  range  of  operating 
conditions,  temperature  has  the  most  dominant  effect 
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Figure  6:  Thickness  nonuniformity  defined  as  the  stan¬ 
dard  deviation  of  the  4  measurements  divided  by  the 
average 


r 

i  i 

1  i  i 

.L 

_ i  \/ 

1  »  !  ; 

u 

jjy 

ill! 

JLU 

i  ; 1 

i  i  ‘‘ 

Ml 

j  i 

1  ;  I  i 

ill 

i  i  1 

I  i  i 

i 

i  M 

Li_J 

_1 

*  * 

i  i  ■ 

Figure  7:  Error  between  the  measured  and  the  desired 
thickness 


on  the  minimization  problem,  with  the  minimum  oc- 
curmg  at  the  boundary  of  the  control  space  where  P  is 
at  its  maximum  value.  Fig.  7  shows  the  error  between 
the  desired  thickness  values  defined  by  the  reference 

^  t^e  aactua^”  one  generated  from  the 
DIDbWR  at  each  measurement  time.  The  thickness 
deviation  caused  by  the  disturbance  vanishes  quickly 
after  a  few  samples.  J 


o  Conclusions 

An  accurate,  yet  simple,  physically- motivated  empiri¬ 
cal  model  for  the  single  wafer  tungsten  LPCVD  pro¬ 
cessing  step  is  considered.  Adopting  a  multiple  re¬ 
sponse  surface  method,  a  reasonably  accurate  descrio- 
tion  of  the  spatial  nonuniformity  is  obtained.  A  simple 
adaptive  optimization  strategy  is  introduced  by  solv¬ 
ing  a  decoupled  problem  of  model  parameter  updates 
and  control  updates.  The  effectiveness  of  this  control 
strategy  is  illustrated  by  means  of  a  simulation  exam¬ 
ple.  It  should  be  noted  that  in  an  application  to  an 
actual  process,  rather  than  a  simulation  platform,  the 
temperature  cannot  be  changed  instantaneously  but 
must  be  controlled  by  regulating  the  power  supplied 
to  the  susceptor  heater.  However,  since  the  relation¬ 
ship  between  the  power  and  the  susceptor  tempera- 
ture  can  typically  be  approximated  by  a  first  order 
ODE,  such  a  description  can  easily  be  incorporated  in 
our  formulation,  leaving  the  main  idea  of  the  adap¬ 
tive  optimization  scheme  largely  unaffected.  This,  to¬ 
gether  with  robustness  issues  arising  from  noisy  mea¬ 
surements  and/or  modeling  errors  as  well  as  the  ex¬ 
perimental  validation  of  such  control  schemes  are  left 
as  topics  of  future  work. 
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Summary 

In  this  note  we  introduce  a  novel  modeling  technique, 
in  an  effort  to  develop  a  physically-motivated  empiri¬ 
cal  model  of  the  deposition  rate  and  spatial  deposition 
nonuniformity  for  a  single-wafer  Tungsten  Silicide  Low 
Pressure  Chemical  Vapor  Deposition  (LPCVD)  process¬ 
ing  step.  In  general,  such  a  description  is  difficult  to  ob¬ 
tain  due  to  the  complexity  of  the  mapping  between  the 
spatial  nonuniformity  and  manipulated  variables.  Com¬ 
bining  the  so-called  multiple  response  surface  method 
[5]  with  a  ‘Teed back- like”  model  structure,  relating  re¬ 
actant  partial  pressures  with  manipulated  variables,  we 
develop  a  reasonably  accurate  description  of  the  depo¬ 
sition  rate  and  spatial  deposition  nonuniformity  across 
wafer  surface.  This  model  offers  good  fitting  accuracy 
with  fewer  adjustable  parameters  compared  with  the 
more  traditional  polynomial-structure  models. 

Further,  based  on  this  modeling  methodology,  we  de¬ 
rive  a  run- to-run  adaptive  control/optimization  strat¬ 
egy  aiming  to  achieve  prescribed  values  of  the  aver¬ 
age  deposition  rate  and  silicon  to  tungsten  ratio  at  the 
wafer  surface,  while  minimizing  the  variation  of  the 
deposition  rate  across  the  wafer  surface.  Such  objec¬ 
tives  are  critical,  and  quite  common,  in  the  modern 
semiconductor  industry.  The  effectiveness  of  this  con¬ 
trol  strategy  is  demonstrated  by  simulation  results,  us¬ 
ing  the  simulation  platform  GFDSWR  [3,  4]  to  repre¬ 
sent  the  single  wafer  tungsten  silicide  LPCVD  process. 
(The  CFDSWR  is  a  computer  program  that  simulates 
the  reaction  and  transport  phenomena  occurring  during 
LPCVD  processes.) 

In  order  to  describe  the  basic  idea  behind  our  simpli¬ 
fied,  control-oriented  modeling  approach,  we  consider  a 
Tungsten  Silicide  LPCVD  process.  In  this  process  the 
reactants,  tungsten  hexafluoride  (WFe)  and  dichlorosi- 
lane  (SiH^Ch;  DCS),  are  fed  in  a  reactor  containing  the 
wafer.  For  the  deposition  of  tungsten  silicide  (WSir) 
films,  at  least  three  reactions,  depositing  Si,  WSh,  and 
W5S13,  seem  to  contribute  to  the  apparent  deposition 
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rate  and  film  composition  [1]: 

SiH2Cl2  —  Si  +  2HC1 

WF6  +  4SiII2Cl2  —  WSi2  +  8HC1  +  SiF*  +  SiF2 
5WF6  +  llSiH2Cl2  -  \V5Si3  +  22HC1  +  7SiF4  +  SiF2 

Although  the  reaction  stoichiometries  have  not  been 
formally  established,  experimental  measurements  have 
been  used  to  correlate  the  apparent  WSir  deposition 
rate  and  film  composition  with  the  wafer  temperature 
and  reactant  partial  pressures  through  (apparent)  reac¬ 
tion  rate  expressions.  These  kinetic  models,  translated 
to  deposition  rates  for  the  three  solid  products  Si,  WSi2, 
and  W5Si3,  are  given  as 

Gi  =  kne<~-t"/T)p1Pi 

G,  =  tsieC-*../r,_5S_  (!) 

a 3  = 

where  G,(.),  i  —  1,2,3  are  the  deposition  (growth)  rates 
of  Si,  WSi2,  WsSi3>  respectively,  Pi ,  P2  are  the  partial 
pressures  of  the  reactants  WF6,  DCS  and  £tj  are  con¬ 
stants,  e.g.,  see  [1,  2]. 

The  typical  objective  in  such  a  process  is  to  adjust  the 
manipulated  variables  (control  inputs)  u  =  [T,  P,  Pi]T 
where  T  is  the  susceptor  temperature,  P  is  the  total 
pressure  and  Pi  is  the  WFs  flowrate,  so  as  to  maintain 
desired  values  for  the  total  deposition  rate  and  silicon- 
to-tungsten  ratio,  while  minimizing  the  variation  of  the 
deposition  rate  across  the  wafer  surface.  Estimates  of 
these  quantities  can  be  obtained  given  measurements 
at  different  points  on  the  wafer  surface.  Here,  assum¬ 
ing  a  radially  symmetric  reactor,  we  define  the  spatial 
nonuniforrnity  (Nun)  across  the  wafer  surface  as  the 
standard  deviation  of  growth  rates  at  the  different  mea¬ 
surement  points  along  the  wafer  radius,  divided  by  av¬ 
erage  growth  rate: 

Nun  =  [jr  Z^iGRn-Gfi)2]112  /GR 

GRu  -  £?-i  Gi(u)  (2) 

GR  =  jrtLiGR* 
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where  G,(n),  GRn  are  the  deposition  rate  of  the  solid 
species  i  and  the  total  deposition  rate  at  the  measure¬ 
ment  point  n,  and  GR  is  the  spatially  averaged  deposi¬ 
tion  rate.  Further,  from  the  stoichiometry  of  the  appar¬ 
ent  reaction  model,  the  silicon  to  tungsten  ratio  (Si/W) 
of  the  deposited  film  at  the  wafer  center  is  given  by 


Si/VV  = 


(3) 


where  A{  denotes  the  density  to  molecular  weight  ratio 
of  the  solid  product  i. 

The  main  difficulty  in  obtaining  a  simplified,  control- 
oriented  model  of  this  process  (relating  the  manipu¬ 
lated  variables  with  the  outputs  of  interest)  is  associ¬ 
ated  with  the  dependence  of  the  local  reaction  and  de¬ 
position  rates  on  the  unmeasured  partial  pressures  of 
WF6  and  DCS  on  the  wafer  surface.  To  develop  such 
a  relationship,  we  observe  that  in  a  reaction-free  envi¬ 
ronment  the  partial  pressures  P\  and  A  could  be  deter¬ 
mined  by  the  corresponding  mole  fractions  and  the  total 
pressure;  on  the  other  hand,  when  a  chemical  reaction 
occurs,  Pi  and  P2  at  the  wafer  surface  will  drop  due  to 
the  consumption  of  WF6and  DCS  during  the  reaction. 
Assuming  that  the  amount  of  reduction  of  the  partial 
pressure  is  proportional  to  the  apparent  overall  reac¬ 
tion  rate,  for  that  species,  we  employ  multiple  response 
surface  techniques  to  approximate  these  rates  by  an  em¬ 
pirical  expression  of  the  manipulated  variables.  Thus, 
we  arrive  at  the  following  empirical  (but  physically  mo¬ 
tivated)  model,  relating  the  manipulated  variables  with 
the  reactant  partial  pressures  at  a  point  on  the  wafer 
surface: 

Pi  =  ^££-(012  +  013  t+o14p 

+e1,TP  +  e16TPFl)  ... 

p2  =  -  (022  +  023r  +  o24p  { 1 

+enTP  +  ewTPFl) 

where  9j *  are  model  parameters  and  A  is  a  constant 
equal  to  the  fixed  flow  rate  of  DCS.  Notice  that  the  “re¬ 
action  feedback”  modeling  idea  is  invoked  in  selecting 
the  parametric  structure  of  the  above  model.  That  is, 
the  first  term  in  equations  (4)  describes  the  reaction-free 
dependence  of  the  partial  pressures  on  the  manipulated 
variables  while  the  rest  describe  the  effect  of  the  partial 
pressure  reduction  due  to  the  reactions. 

Next,  substituting  the  reactant  partial  pressure  ex¬ 
pressions  (4)  in  the  deposition  rate,  nonuniformity  and 
ratio  expressions  (1),  (2)  and  (3),  we  obtain  a  model  of 
the  quantities  of  interest  in  terms  of  the  manipulated 
variables  u  =  [T,  P,  Fi]T  and  the  vector  of  adjustable 
parameters  9  —  [9 n>  •  •  • , $i6>  ^21 ,  •  •  • ,  Ae]T*  Notice  that 
since  the  deposition  models  of  all  three  solid  products 
share  the  same  set  of  parameters  at  each  measurement 
point,  the  total  number  of  parameters  is  significantly  re¬ 
duced  compared  to  an  approach  that  would  model  the 
deposition  rate  of  the  solid  products  individually. 

Given  the  simplified  model  (2)  and  (3),  with  the  vari¬ 
ous  rate  expressions  determined  by  (1)  and  (4)  the  con¬ 


trol  problem  may  now  be  formulated  as 

f  GR(u,Om)  -  GR*  =0 

min  Nun (u,  0*),  s.t.  I  Si/W(u,0*)  -  Si/W*  =  0 

Umin  ^  U  <  Umar 

(5) 

where  GR*  and  Si/W  are  desired  values  of  the  aver¬ 
age  deposition  rate  and  Si/W  ratio.  Here  0*  is  used 
to  denote  the  model  parameters  for  which  the  rate  ex¬ 
pressions  (1)  offer  a  “good”  approximation  of  the  actual 
process.  Based  on  preliminary  simulation  studies,  such 
a  value  exists  but  may  (and  typically  does)  depend  on 
the  processing  conditions.  Consequently,  9*  is  only  par¬ 
tially  known,  a  situation  that  is  expected  to  be  even 
more  pronounced  in  an  actual  processing  system.  In 
order  to  obtain  an  approximate  solution  to  the  above 
control  problem,  we  adopt  an  indirect  adaptive  control 
approach  whereby  the  estimates  of  the  model  parame¬ 
ters  9  are  updated  using  deposition  rate  measurements, 
so  that  Gi(u}  0;  n)  ~  G;(u,0*;n)  where  G<(u,0*;n)  is 
the  deposition  rate  of  product  species  i  at  a  point  n 
(e.g.,  see  [6,  7,  8]).  In  turn,  these  estimates  are  used 
in  (5)  instead  of  0*  and  the  constrained  minimization 
problem  is  solved  to  determine  the  control  input  for  the 
next  run. 

The  behavior  of  the  above  adaptive  controller  was 
tested  by  using  the  CFDSWR  platform  to  simulate  the 
actual  process.  The  simulation  results  (available  upon 
request)  show  that  the  trajectories  of  the  average  total 
deposition  rate  and  Si/W  ratio  are  controlled  to  their 
individual  desired  values  (2000  A/min  and  2,  respec¬ 
tively)  while  the  nonuniformity  is  “minimized.”1 
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Abstract 

The  application  of  optimal  control  theory  to  the  pro¬ 
cess  of  low  pressure  chemical  vapor  deposition  on  pat¬ 
terned  surfaces  can  substantially  decrease  the  process¬ 
ing  time  for  a  given  step  coverage ,  compared  with  the 
programmed  rate  chemical  vapor  deposition  (PRCVD) 
process .  The  control  model  is  developed  from  the 
simultaneous  one-dimensional  Knudsen  diffusion  and 
chemical  reaction  description .  For  such  a  model ,  the 
optimal  control  problem  is  formulated  as  to  find  a 
temperature  trajectory  yielding  the  minimum  process¬ 
ing  time  and  its  solution  is  computed  numerically  via 
a  modified  variation  of  extremals  method.  For  the 
thermally  activated  deposition  of  silicon  dioxide  from 
tetraethylorthosUicate  (TEOS)  and  for  a  ninety-six 
percent  step  coverage,  the  optimal *  control-generated 
temperature  trajectory  results  in  time-savings  of  ap¬ 
proximately  twenty-eight  percent ,  when  compared  to 
the  PRCVD  approach. 

1.  Introduction 

Higher  scales  of  device  integration  and  cost  reduction 
of  integrated  circuits  are  two  major  trends  in  the  micro¬ 
electronics  industry.  In  order  to  achieve  cost  reduction, 
a  common  practice  is  to  increase  the  device  through¬ 
put.  In  turn,  such  an  increase  can  be  achieved  by  in¬ 
creasing  the  number  of  devices  per  wafer.  Therefore, 
four  inch  wafer  fabrication  lines  are  being  converted  to 
five  or  six  and  even  eight  inch  wafer  lines.  This  con¬ 
version,  however,  can  adversely  affect  the  in  ter- wafer 
and  intrarfeatuxe  film  thickness  uniformity  in  low  pres¬ 
sure  chemical  Yapor  deposition  (LPCVD)  processes. 
Consider,  for  example  conventional  axial-flow,  volume- 
loaded,  multiple  wafer-in-tube  reactors  (MWRs)  com¬ 
monly  used  in  LPCVD  processes  [2].  To  promote  inter¬ 
wafer  uniformity,  for  any  wafer  size,  source  gas  con¬ 
version  must  be  maintained  at  a  rather  low  level  in 
order  to  minimize  axial  reactant  concentration  gradi¬ 
ents.  Low  gas  conversion  leads  to  both  economic  ineffi¬ 
ciency  and  potentially  hazardous  operations,  since  the 
source  gases  are  often  expensive  and  extremely  toxic. 
To  maintain  intra- wafer  uniformity  in  MWRs  with  the 
wafers  perpendicular  to  the  reactant  flow,  the  deposi¬ 
tion  rate  must  be  much  lower  than  the  characteristic 
transport  rate  from  the  wafer  edge  to  center.  Lower 
deposition  rates  result  in  longer  processing  time  to  de- 
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posit  films  of  given  thickness. 

In  an  effort  to  avoid  the  uniformity  problems  in 
MWRs,  the  microelectronics  industry  has  used  sin¬ 
gle  wafer  reactors  (SWRs).  These  have  the  advan¬ 
tage  that  much  higher  conversion  of  the  expensive  and 
toxic  reactants  can  be  obtained  without  violating  uni¬ 
formity  constraints.  These  high  deposition  rates  are 
necessary  for  SWRs  to  achieve  throughput  parity  with 
MWRs  with  the  same  film  product  specifications  or 
process  constraints.  An  important  issue  arising  in  this 
approach  is  that  the  step  coverage  in  patterned  re¬ 
gions  of  wafer  degrades  as  deposition  rate  is  increased. 
Thus,  maintaining  high  step  coverage  under  high  depo¬ 
sition  rate  is  a  big  challenge  in  designing  rapid  thermal 
LPCVD  processes.  Cale  and  coworkers  have  suggested 
a  novel  programmed  rate  process  protocol  for  a  SWR 
LPCVD  process  (PRCVD)  that  can  be  used  to  de¬ 
crease  the  required  deposition  time  and  hence  increase 
throughput  subject  to  a  given  step  coverage  constraint 
[1,  2]. 

In  this  paper,  we  employ  optimal  control  theory  to 
develop  an  alternative  process  protocol  in  an  effort  to 
minimize  the  processing  time,  subject  to  the  same  step 
coverage  constraint.  In  a  similar  fashion  as  in  the 
PRCVD  approach,  we  use  the  processing  temperature 
as  the  control,  or  manipulated,  variable  while  keeping 
all  other  conditions  constant.  Our  simulation  results 
show  that  the  temperature  profile  obtained  by  solv¬ 
ing  the  associated  optimal  control  problem  can  yield 
even  further  reduction  of  the  processing  time  than  the 
PRCVD  approach. 

2.  Modeling 

The  continuum  diffusion  reaction  model  has  been  used 
in  the  recent  literature  to  predict  step  •  coverage  in 
LPCVD  [1,  2].  Raupp  and  Cale  [4]  derived  the  equa¬ 
tions  describing  time-dependent  simultaneous  hetero¬ 
geneous  reaction  and  Knudsen  diffusion  which  apply  to 
deposition  through  a  single  heterogeneous  reaction  of 
arbitrary  kinetics  in  a  feature  of  arbitrary  symmetric 
cross  section.  The  coordinate  system  for  the  feature 
is  a  moving  coordinate  system  with  the  origin  at  the 
center  of  the  feature  mouth  during  deposition.  Such  a 
coordinate  system  for  a  trench  is  shown  in  Figure  1, 
where  X(t)  is  the  film  thickness  in  the  bottom  of  the 
trench  at  tune  f ;  L(t)  is  the  film  thicxness  at  the  mouth 
of  the  trench  at  time  f;  Z\  =  H(t)  is  the  instantaneous 
trench  depth;  W(Z,  t)  is  the  instantaneous  width  of  the 
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Figure  1:  Cross  section  of  a  feature  during  deposition 


Dimensionless  Variable 

Definition 

axial  distance 

f  =  Z/H{t) 

time 

r  —  tDxo/ Hi 

feature  depth 

■H(r)=  H(t)/H(0) 

feature  cross  sectional  area 

feature  perimeter 

nt  r)  = 

feature  width 

W(f,r)  =  4r^r 

Knudsen  diffusivity 

concentration 

rate  of  reaction 

s-i  (a  ( * H 

G](Z,  T)  —  ^(P(o,0),T(Ol) 

solid  density 

psh  =  CsJCfia 

trench  at  depth  Z ;  Wo  is  the  initial  width  of  a  trench; 
and  Ho  is  the  initial  trench  depth. 

We  can  assume  [2]  that  deposition  occurs  under  con¬ 
ditions  such  that  the  rate  depends  only  on  the  concen¬ 
tration  of  the  limiting  reactant,  the  feature  is  sDatially 
isothermal,  and  that  surface  diffusion  and  racial  or  lat¬ 
eral  gas  concentration  gradients  are  negligible.  It  is, 
therefore,  reasonable  to  treat  the  molecular  transport 
as  an  one  dimensional  process  in  which  species  flux 
is  expressed  in  terms  of  local  concentration  gradients 
and  Knudsen  diffusion  coefficients.  The  expressions 
for  Knudsen  diffusivity  are  based  on  a  cross-sectional 
average  for  idealized  feature  geometry,  e.g.,  infinitely 
long  rectangular  trenches  and  cylindrical  contact  holes. 
Here,  we  only  consider  the  infinitely  long  rectangular 
trench  model,  although  the  same  methodology  k  appli¬ 
cable  to  cylindrical  contact  holes  as  well.  For  infinitely 
long  rectangular  trenches,  (2,  6,  7]  give  the  following  es¬ 
timate  of  D{  (instantaneous,  cross  sectional-averaged, 
local  Knudsen  diffusivity  of  a  gaseous  species  ;): 


r8K-Br(t)j 

°-5  H(t) 

18+  7a(Z,t)  1 

4 

L 18  +  16q(Z,  t)  +  2a2{Z,  t) . 

(l) 

where  Kb  is  the  Boltzmann  constant,  T  is  the  tem¬ 
perature  (in  K),  m,-  is  the  molecular  mass  of  i,  and 
a  is  the  instantaneous,  local  aspect  ratio,  given  by 

a(Z,t)  =  Using  the  notation: 

•  instantaneous,  local  concentration  of  species  t; 

•  A  and  P:  feature  sectional  area  available  for  molec¬ 
ular  flow  and  feature  perimeter  at  position  Z  and  time 
t,  respectively; 

•  pi(Z,t):  instantaneous  partial  pressure  of  species  i 
at  position  Z  and  time  t  (in  torr|; 

•  t instantaneous  rate  of  j-th  heterogeneous  chemi¬ 
cal  reaction  based  on  local  conditions; 

•  Ci(o,t)  =  (rj  13  the  ideaJ  s33  constaiit); 

and.  the  dimensionless  variables  defined  in  Table  1,  the 
model  equation  in  dimensionless  from  is  described  as 
follows  [1]: 

Balance  of  species  i: 


-4*ARiP(f,  r)  Jasl  vi»^i(£»  r)^i(r) 

(2) 

Initial  conditions: 


Table  1:  Definition  of  Dimensionless  Variables 


Boundary  conditions: 


0i(°,r)  -  -4°;°1W 

-  .Co.  M  0.0)  Wr) 

■(0)^0, 0)  VH(i.r) 


W 


where  0  <  £  <  6,  r  >  0,  and  &  =  1.  <f>j  is  referred  to 
as  the  step  coverage  modulus  for  the  reaction  j  and  is 
given  by 


g3(o)i>(o,o)Jy(p(o.<).r(0)  ,5) 

~  .4(0,  0 )DRoCRo 

Finally  Xm  denotes  the  partial  pressure  ratio  and  is 
defined  as  Xju  = 

The  LPCVD  process  of  deposition  of  silicon  dioxide 
by  TEOS  pyrolysis  involves  a  single  gaseous  reactant 
and  deposition  of  a  single  solid  species  Si02.  The  reac¬ 
tion  stoichiometry  and  rate  expression  for  this  process 
are  [8]:  TEOS  — *  Si02  +  products 


Rff{p,T)  =  k0  exp 


where,  the  temperature  T  and  partial  pressure  p  have 
units  of  Kelvin  and  torr,  respectively.  The  activation 
energy  Ea  is  195kJ/mol  and  k0  =  38.77  while  the  order 
of  reaction  is  N  =  0.5.  Finally,  in  dimensionless  form, 
the  model  equations  for  LPCVD  of  Si02  from  TEOS 
in  long  rectangular  trenches  become  [2,  4]: 

Species  balance  for  TEOS: 

<ti _ i_  A  (vw— ^  -  —  fi  -  -1  (V 

dr  -  ww2  d(  (  dt)  w  V  p) 


Boundary  conditions: 


*(0,r) 


p<o,tmo) 


(3) 


where  G  is  the  dimensionless  rate  of  reaction,  <?  is  the 
step  coverage  modulus  given  by 


2ffa(Q)it(p(0,0,T(f)) 
^T)~  C(0,0)D(0,0)W(0,0) 


(9) 


*(«.0)  =  1  ;  if(0)  =  l 


(3) 


and  aQ  is  initial  aspect  ratio,  given  by  a0  =  ffo/Wo- 
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3.  Programmed  Rate  CVD  (PRC YD) 

In  common  commercial  deposition  processes,  the  wafer 
temperature  and  partial  pressures  of  all  reactants  at 
the  wafer  surface  are  usually  held  constant.  Therefore, 
the  rate  of  deposition  remains  constant  at  the  wafer 
surface  during  the  entire  process.  Such  processes  are 
referred  to  as  constant  rate  chemical  deposition  (CR- 
CVD)  processes  [2,  3],  Since  the  process  parameters  re¬ 
main  constant  in  CRCVD,  the  model  parameters  (step 
coverage  modulus  and  reactant  partial  pressure  ratios 
at  the  wafer  surface  )  also  remain  constant.  The  ob¬ 
jective  of  a  PRCVD  process  protocol  is  to  vary  the 
deposition  rate  as  the  deposition  progresses,  by  chang¬ 
ing  one  or  more  of  the  process  parameters,  so  as  to 
decrease  the  deposition  time,  subject  to  a  specific  fi¬ 
nal  step  coverage  constraint.  As  mentioned  in  the  In¬ 
troduction,  such  a  protocol  is  of  particular  interest  in 
SWRs,  though  in  principle  the  same  concept  can  be  ap¬ 
plied  to  other  reactor  configurations  [2].  The  PRCVD 
process  path  employed  in  [l]  consists  of  two  distinct 
legs:  (1)  A  programmed  rate  leg  during  which  the  de¬ 
position  rate  at  the  wafer  surface  is  adjusted  so  as  to 
keep  the  instantaneous  differential  step  coverage  jT  (see 
below)  constant.  (2)  A  constant  rate  leg  during  which 
the  process  is  continued  at  a  constant  deposition  rate. 
For  the  programmed  rate  portion  of  the  process,  the 
deposition  rate  is  adjusted  by  changing  the  tempera¬ 
ture  since  this  is  the  variable  having  the  most  dominant 
effect  on  the  reaction  rate.  The  point  at  which  the  pro¬ 
cess  is  changed  from  the  programmed  rate  leg  to  the 
constant  rate  leg,  is  referred  to  as  the  switching  point 
(SP).  The  switching  point  is  defined  to  be  the  time 
at  which  a  specified  percentage  of  feature  closure  is 
obtained.  Therefore,  a  PRCVD  process  path  is  deter¬ 
mined  by  selecting  J1  and  SP.  A  judicious  selection  of  r 
and  SP  can  lead  to  a  higher  average  rate  of  deposition 
during  the  process  and,  hence,  to  significant  savings 
in  processing  time  compared  with  the  corresponding 
CRCVD  process  yielding  the  same  step  coverage. 

To  demonstrate  the  use  of  the  PRCVD  approach, 
we  consider  the  case  of  Si02  deposition  from  TEOS 
decomposition  in  a  rectangular  trench.  Since  the  reac¬ 
tion  rate  at  the  wafer  surface  varies  during  PRCVD, 
the  step  coverage  modulus  <j>  also  varies.  Thus,  the 
problem  of  determining  a  rate  path  is  equivalent  to 
finding  a  path  for  the  step  coverage  modulus  <j> .  For 
this  process,  <p  can  be  expressed  as  [1] 

*(*)  =  ™lRf°R»(p,T)  (10) 

i-to-Uxo-L  lO 

The  above  equation  reveals  that  the  step  coverage 
modulus  can  be  varied  by  changing  either  the  partial 
pressure  of  TEOS  at  the  wafer  surface,  or  the  wafer 
temperature,  or  both,  during  the  programmed  rate  leg. 
We  only  discuss  the  case  where  the  wafer  temperature 
is  varying  while  maintaining  the  TEOS  partial  pressure 
at  the  wafer  surface  constant. 

Further,  we  can  assume  [2]  that  at  any  particular 
instant  in  time,  the  gas-phase  concentration  profiles  in 
the  feature  are  the  steady-state  profiles  that  would  ex¬ 
ist  for  the  reaction  conditions  prevailing  at  that  instant 
and  that  the  molar  density  of  the  gas  is  negligible  in 
comparison  to  that  of  the  solid  film,  i.e.,  9  /?.  Using 

these  assumptions,  the  species  balance  for  TEOS  (7), 


and  the  corresponding  boundary  conditions  become: 

r)W«,  r)^lll  -  rf(r )G(*.  r)V(r)  =  0  (11) 


m  ’ 


30(6,  r)  W(r)  xG(6,r) 

“  (12) 


The  reactant  concentration  profile  in  the  feature  at 
any  instant  in  time  can  be  determined  by  solving  the 
above  differential  equation  and  is  given  by 

«({. r)  =  «(0,  r)  -  *( r)«*(r)  [{7(6,  r)  + 


4  C  (i.r)rfi 


(*.<•)  ~  So 


where  £(£ ,  r)  =  /0f  G(s,  r)ds 

The  instantaneous  differential  step  coverage  T  is 
now  defined  by  T  =  Since  L  = 

R(p(Z,T),T(t)),  X  =  R(p(0,T),T(t))  and  2X(t)L  + 
W(Z,t)  =  Wo  (see  Fig.  1),  we  obtain 

r_ta,r)l" 

r-[*(0,r)J  (H) 

During  the  programmed  rate  leg  of  the  PRCVD,  T  is 
held  constant  and  the  temperature  is  determined  by 
solving  (10),  (13)  and  (14)  using  a  first  order  approx¬ 
imation.  During  the  constant  rate  leg  (after  SP),  the 
temperature  is  held  constant  and  equal  to  the  tem¬ 
perature  achieved  at  the  end  of  the  programmed  rate 
leg.  A  reduction  of  processing  time  with  the  PRCVD 
approach  versus  the  CRCVD  one,  depends  on  the  se¬ 
lected  values  of  the  process  path  parameters  JT  and  SP. 
For  a  given  step  coverage,  however,  r  and  SP  may  be 
found  by  trial  and  error  so  as  to  minimize  processing 
time  [1,  2]. 

4.  Optimal  Control  CVD  (OCCVD) 

The  first  step  in  applying  optimal  control  theory  to 
the  SiOo  deposition  by  TEOS  decomposition,  is  to  de¬ 
scribe  the  process  model  by  a  set  of  ordinary  differ¬ 
ential  equations.  Then,  the  optimal  control  problem 
for  the  CVD  process  becomes  one  of  determining  the 
temperature  trajectory  (in  this  case  the  control  input) 
subject  to  the  final  step,  coverage  constraints  such  that 
the  processing  time  is  minimized. 

From  the  definition  of  the  instantaneous  step  cover¬ 
age  T,  and  since  the  final  step  coverage  is  evaluated  as 
the  ratio  of  the  final  film  thickness  at  the  base  of  the 
feature  and  the  final  thickness  at  the  wafer  surface,  we 
obtain  the  following  simplified  model  of  the  process: 

L  =  R{p{Z,T),T)  ;  X  =  T(t)L  ■  (15) 

The  first  order  perturbation  solution  of  this  pseudo¬ 
steady-state  model  provides  a  reasonable  estimate  of 
the  actual  concentration  and  deposition  profiles  for  the 
range  of  <f>  values  (reaction  rates)  which  yield  uniform 
deposition  over  the  wafer.1  The  first  order  perturba¬ 
tion  solution,  to  (13)  for  the  dimensionless  concentra¬ 
tion  profile  in  a  trench  can  be  written  as 

1  SlimiUtiona  of  the  full  model  (2),  (7)  support  this  claim  [2]. 
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where  a  =  a(Zi,t).  Combining  (10),  (13)  and  (14),  we 
obtain  the  following  expression  for  F 


T(t)  = 


fi(p(0,  t).T(t)) 


l  N 


<p(o)  = 


18  +  16or(Z,  t)  +  2a3(2T,  f) 

18+7o(Z,i) 


(17) 


The  above  equation  reveals  that  T  depends  only  on 
the  variables  T  and  a,  since  R(p(0,t),T(t))  depends 
only  on  the  temperature  T(t)  when  the  partial  pressure 
p(0,  l)  is  constant.  Thus,  the  optimal  control  can  be 
cast  as  follows: 

Optimal  Control  Problem:  For  t he  system  model 
(15)  find  an  optimal  control  T,  to  minimize  the  pro¬ 
cessing  time  tj  subject  to  the  constraints 


X(0)  =  1(0)  =  0  ;  >  5C  ;  L(t,)  =  L.  (18) 

where  F  Is  given  by  equation  (17),  SC,  L«  are  the 
desired  step  coverage  and  final  thickness. 

To  solve  the  above  optimal  control  problem  we  may 
employ  numerical  techniques  to  determine  the  opti¬ 
mal  control  trajectories  as  the  solution  to  a  two-point 
boundary-value  problem.  One  such  general  method 
is  the  so-called  Variation  of  Extremals  [9].  However, 
since  in  its  standard  form  this  method  is  not  directly 
applicable  to  our  case,  a  suitable  modification  needs  to 
be  derived.  To  achieve  this,  let  us  begin  by  employing 
optimal  control  theory  to  determine  necessary  condi¬ 
tions  for  a  control  trajectory  to  be  optimal.  In  general, 
the  problem  is  to  find  an  admissible  control  U*  in  the 
feasible  set  Q  that  causes  the  system 
X(t)  =  a(X{i),U(i),t) 

to  follow  an  admissible  trajectory  X*(i)  minimizing 
the  performance  index 

J(U)  =  4*  giXfyi  U(t),t)dt 

and  satisfies  boundary  conditions  X(fo)  = 

X(tf)  =  Xj ,  where  i0  is  the  specified  initial  time  and 
tj  is  the  unknown  final  time.  In  our  case  (minimum 
time  problem),  h  =  0  and  g  =  1,  while  the  control  U 
is  the  temperature  T.2  By  defining  the  Hamiltonian 

function  H  Q  4*  P~^  a,  the  necessary  conditions  for 
optimality  are 

x'(t)  =  f£(x'{t),ir(t),pa(t)tt) 

p-(0  =  -!£(*’«,  ^(0.0  (19) 

H*(t)  =  mint/(t)€n  H(X*(t),  U(t),  Pm{t),  *) 

for  all  t  €  with  boundary  conditions 

X-(to)  =  Xo  ;  X'(t j)  =  Xj  ;  H*(i/)  =  0  (20) 

Since  P(t0)  is  unknown,  we  can  guess  a  value  P°(<o) 
for  the  initial  costate  and  use  it  to  numerically  inte¬ 
grate  (19)  from  t0  to  tj.  Under  the  variation  of  ex¬ 
tremals  approach,  the  observed  values  of  P(tj)  are 

2The  selection  of  the  temperature  a«  the  control  variable, 
instead  of  -e.g  -  partial  pressure,  is  considered  here  since  it  is 
easier  to  manipulate  in  practice  and  leads  to  simpler  expressions. 


then  used  to  systematically  adjust  the  guessed  values 
of  P(to)-  One  technique  for  making  systematic  adjust¬ 
ments  of  the  initial  ccstate  values  is  based  on  New¬ 
tons  method  for  finding  roots  of  nonlinear  equations 
[9].  Thus,  given  n  state  equations  and  n  costate  equa¬ 
tions,  the  update  law  for  P(i0)  is  given  by 


p(<+1)M  =  J*°(M  -  {PV(*/)U"‘  [•?(*/)].■  (21) 


Vx  = 


d3K 


Vp  = 


dPdX 


(0 


Vx  + 


r^H 


d3X ^ 
Tx(to)  =  0 


Vx  + 


dP 2 


dXdP 

Vp(ta)  —  1 


Vp 


Vp 


(22) 

(23) 

(24) 


where  VP(P^XU),t)  are  the  main- 

ces  of  partial  derivatives  of  the  components  ot  a(ij 
and  P(*)  with  respect  to  each  of  the  components  of 
P(t0),  evaluated  at  P^(f0).  Notice  that  Vp  and  Vx 
are  needed  only  at  the  terminal  time  tj,  while  the 
above  derivations  assume  continuity  of  partial  deriva¬ 
tives.  The  notation  means  that  the  enclosed  terms 
are  evaluated  on  the  t-th  trajectory. 

Since  the  P*(tj)is  not  specified  in  our  case,  we  can¬ 
not  directly  apply  the  standard  variation  of  extremals 
algorithm.  In  order  to  develop  a  suitable  modification, 
we  consider  the  observed  final  state  X(i /)  as  the  re¬ 
sult  of  the  choice  of  initial  costates  P(i o)  and  final  time 
tj.  We  then  guess  P^(^o)  t ^  \  and  update  them 
so  that  the  observed  X(t /)  converges  to  the  desired 
value.  Thus,  we  finally  obtain  the  following  equations 

for  updating  P^+l\io)  and 


±Xf(i)  =  Vx AF(i)  4 
0=#AJT/(i)+#^AP(f) 

Vx  ¥r  1_ 


ex  AXf(' 
AP(.) 
A</(«) 


A  X,(i) 
^•AX/(i) 

(2S) 


where  A Xj(i)  =  Xj  —  X^'Xtj),  AP(i)  —  .p(,+1)(to) 
P^Xto),  A tf(i)  =  -  i(/\  and  311  quantities  are 


evaluated  at  tV,  P^{t 0).  The  iteration  is  terminated 

when  pf/  -  A’(<j,’))||  <  7  in  satisfied. 

It  should  be  emphasized  that  the  above  procedure  of 
computing  the  optimal  trajectory  does  not  account  for 
the  possibility  that,  during  the  iterations,  the  instanta¬ 
neous  step  coverage  r  may  become  a -complex  number, 
especially  near  closure.  Although  a  systematic  way  to 
prevent  this  situation  is  feasible  (e.g.,  by  introducing 
an  additional  constraint  in  the  optimal  control  prob¬ 
lem),  for  the  sake  of  simplicity  we  adopted  a  different 
approach.  That  is,  we  use  a  similar  strategy  as  the 
PRCVD  process  in  that  we  only  find  the  optimal  con¬ 
trol  during  the  first  leg  so  as  to  minimize  the  corre¬ 
sponding  time,  while  for  the  second  leg  the  processing 
time  is  fixed  and  the  temperature  is  held  constant. 


5.  Results  and  Discussion 
To  compare  the  results  of  CRCVD,  PRCVD  and  OC- 
CVD,  we  consider  the  model  of  Si02  deposition  by 
TEOS  decomposition,  as  described  by  (15)  with  N  — 
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0.5.  The  desired  step  coverage  considered  here  is  SC  = 
0.96.  For  best  results  in  the  PRCVD  case,  [1]  suggests 
the  following  choice  of  parameters:  T  =  0.9775  for  the 
programmed  rate  leg  and  SP  =  0.899.  With  these  val¬ 
ues  the  temperature  under  PRCVD  decreases  until  the 
switching  point  SP  and  remains  constant  thereafter. 
The  instantaneous  step  coverage  T  is  held  constant  be¬ 
fore  and  decreases  after  the  switching  point.  The  total 
process  time  achieved  under  PRCVD,  for  98  percent 
closure,  is  386  sec.  This  value  reflects  significant  sav¬ 
ings  over  the  CRCVD  approach  which,  for  the  same 
SC  and  closure,  requires  729  sec  (Fig.  2). 

On  the  other  hand,  under  the  OCCVD  approach, 
we  first  select  the  time  of  the  second  leg  time  as  100 
seconds.3  With  the  temperature  held  constant  in  this 
leg,  equal  to  the  final  temperature  of  the  first  leg  in 
the  previous  iteration,  (15)  are  integrated  backwards 
in  time  to  obtain  the  final  state  X{tj)  at  the  end  of 
the  first  leg.  This  final  state  is  then  substituted  in  (25) 
to  yield  the  next  estimate  for  P(t0 )  and  t}.  These  val¬ 
ues  are,  in  turn,  used  to  integrate  the  state  equations 
(19)  forward  in  time  with  the  input  U  (temperature) 
computed  accordingly.  The  procedure  is  repeated  un¬ 
til  convergence  is  achieved.  The  OCCVD  approach  re¬ 
sults  in  a  first  leg  time  of  178  seconds  and  a  total  time, 
for  98  percent  closure,  of  278  seconds.  Compared  to 
the  PRCVD  approach,  OCCVD  yields  a  28  percent 
reduction  of  processing  time  (Fig.  2). 

The  above  results  demonstrate  that  significant  sav¬ 
ings  in  processing  time  can  be  obtained,  without  com¬ 
promising  the  step  coverage  constraint,  by  using  op¬ 
timal  control  theory  to  compute  temperature  trajec¬ 
tories  for  a  CVD  process.  Another  advantage  of  em¬ 
ploying  optimal  control  theory  in  such  problems  is  that 
it  helps  to  reduce,  if  not  eliminate,  the  “judicious  se¬ 
lection'’  of  critical  parameters  (e.g.,  V  and  SP  in  the 
PRCVD  approach).  On  the  other  hand,  the  price  paid 
for  these  improvements  is  related  to  the  increased  com- 
putational  complexity  of  the  solution  and  the  difficulty 
of  implementing  the  optimal  trajectories  on  the  ac¬ 
tual  process.  Without  attempting  to  completely  re¬ 
solve  these  issues  here,  we  note  that  the  former  is  of 
lesser  significance,  especially  in  view  of  the  computa¬ 
tional  power  of  modern  computers.  Furthermore,  the 
trajectory  implementation  problem  can  be  alleviated 
to  a  large  extend  by  appropriately  designing  the  lo¬ 
cal  (inner-loopi  temperature  controllers.  In  the  same 
vein  it  should  be  pointed  out  that,  in  contrast  to  the 
PRCVD  approach,  optimal  control  theory  can  easily 
account  for  the  constraints  imposed  by  the  tempera¬ 
ture  dynamics  in  the  solution.  In  such  a  formulation, 
the  inner-loop  temperature  set-point  would  be  used 
as  the  control  variable  in  the  optimal  control  prob¬ 
lem,  with  an  additional  constraint  (in  the  form  of  a 
simple  dynamical  model  or  just  a  temperature  deriva¬ 
tive  constraint)  imposed  by  the  bandwidth  of  the  lo¬ 
cal  temperature  closed-loop.  Thus,  although  the  ex¬ 
act  relationship  between  the  desired  and  the  actual 
wafer  temperatures  is  very  complicated  or  even  un¬ 
known,  the  proper  use  of  the  local  feedback  controller 
can  ensure  the  successful  implementation  of  the  opti¬ 
mal  temperature  trajectory.  This  issue  is  anticipated 

3Thi*  value  is  somewhat  arbitrary  and  could  be  selected 
more  systematically  by  performing  an  one- parameter  optimiza¬ 
tion  with  respect  to  the  second  leg  time.  However,  for  the  sake 
of  simplicity,  this  approach  is  not  adopted  in  the  present  study. 


Figure  2:  Temperature  trajectory  comparison  under 
CRCVD,  PRCVD  and  OCCVD 


to  be  of  importance  in  our  future  work,  involving  the 
experimental  validation  of  the  PRCVD  and  OCCVD 
approaches,  since  the  maximum  temperature  rate  ca¬ 
pabilities  of  the  available  reactor  (a  “Spectrum  202”) 
are  below  the  unconstrained  optimal  trajectory  rates. 

Finally,  a  more  subtle  problem  arises  from  the  con¬ 
sideration  of  the  “quality”  of  the  deposited  film  in 
terms  of  grain  sire  and  adhesion  properties  (depend- 
ing  largely  on  the  processing  temperature).  This  issue 
is  expected  to  impose  constraints  on  the  temperature 
range  that  need  to  be  determined  experimentally.  In 
addition,  in  order  to  establish  such  an  optimal  tra¬ 
jectory  generation  procedure  as  a  practical  alternative 
to  the  current  approaches,  the  various  derivations  and 
computations  should  be  “automated”  to  a  great  extend 
in  order  to  become  usable  by  process  engineers  (e.g., 
provide  the  ability  of  easily  generating  a  new  opti¬ 
mal  trajectory  when  the  processing  conditions  change). 
These  issues,  however,  extend  beyond  the  scope  of  this 
study  and  are  left  as  a  topic  of  future  research. 
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By  employing  ...  a<.!C,ipllo„  ,J  Ip-press, , re  cherni- 

”  ‘"‘  I  lPCVDI,  an  optimally  <o, . trolled  ehemieal  vapor 

totractliovysilanc  (TEOS).  Tim  ^  lhe  ,»,„6e 
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CVD  (CRCVD). 

introduction 

^§gT-  Two  major  challenges  the  ^^0.^!.“  '.'^reduce 

/-fflfof  device  integration  and  cost  re !  ,0'  cov°rage  a„d  film  thickness  uniformity 

:*cost  is  to  increase  device  througlP  n  S  water  feaclors  (SWRs)  arc  becoming  more 
Ifggare  critical  con.trainU  .nLPCVD.  b  o  w-  h  lhe  same  fi,m  product  specifica- 

^common  in  order  to  obtain  b^erumfo^^  ^  necessary  (or  SWRs  to  maintain 

1  ftionor  process  constraints,  higii  P  tor  (MWRs).  Unfortunately,  the  step 

•  Si-ssSS' throughputs  comparable  to  multip  e-'  ^  degrades  as  deposition  rate  is  in- 

^covecage  in  patterned  reg.ons  of  wafers  S-'erally  deg  acceptable  step 

BraJ.  Thus,  *.  ^VD 


SSUl  Thus,  ‘hr  speeiheation  of  P~~  deaigoiog  IFCVD 

coverage  and  high  deposition  rate  is  a  m  j  =  conventional  constant 

SmSZL  1.  SWRs.  This  problem  »  *"*■*  “  “  '  „„„„„  **«  ,he 

cvd  (crcvd),  m  .>*.  ^  «.  <*«-*»  -  - 

of  the  deposition  process.  surface  during  the  process. 


TZZS:-  (he  ,-aler  surf.ee  durmg  -  P~~ 
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feature  aspect  ratios  increase,  and  the  deposition  rate  (luring  CR 
VcT^  u3tl>V low' enough  to  ensure  good  step  coverage  as  features  close,  where  tin 
as  Initios "are  the  highest.  Calc  and  coworkers  have  suggested  programmed  rate 
CVD^PRCVD)  for  SWR  LPCVD  processes,  in  which  the  operating  conditions  are 
1  d  during  the  deposition  in  a  prescribed  manner  [1*3].  The  deposition  conditions 
C  iaUh  ed  in  PRCVD  such  that  the  deposition  rate  decreases  during  processing 
the  feature"  aspect  ratios  increase.  PRCVD  can  hr  used  to  decrease  the  deposition 
for  a  given  final  step  coverage,  thereby  increasing  throughput. 


[n  this  paper,  we  employ  optimal  control  theory  to  develop  a  new  approach,  re 
ferred  to  as  optimally  controlled  chemical  vapor  deposition  (OCCVD)  for  the  specific 
problem  of  maximizing  throughput  for  a  specified  step  coverage.  For  the  purpose 
of  developing  such  a  control  straicgy,  we  employ  an  approximate  mode!  based  on 
dimensional  Knudsen  diffusion  and  chemical  reaction  description  [1-3].  Th 
time  savings  achieved  by  OCCVD  depend  upon  the  chemistry  and  other  process  pa 
rameters,  including  the  step  coverage  constraint.  In  this  work,  the  temperature  i 
changed  during  deposition.  More  generally,  reactant  flow  rates  and  other  process  set 
points  could  be  changed.  Our  simulation  results  show  that  the  temperature  trajecton 
obtained  by  solving  the  associated  optimal  control  problem  can  yield  even  further  rc 
auction  of  the  processing  time  than  the  PRCVD  approach.  Step  coverages  obtained 
bv  the  OCCVD  process  were  computed  using  the  simulation  package  EVOLVE  (4) 
order  to  ensure  that  errors  introduced  through  the  use  of  the  approximate  model  arc 
relatively  small. 

MODELING 


For  the  purpose  of  developing  control  strategies,  we  treat  transport  as  a  one  di 


mensional  process  in  which  species  fluxes  are  expressed  in  terms  of  local  concentration 
dients  and  Knudsen  diffusion  coefficients.  Calc  and  coworkers  [1-3]  presented  th 


qualions  which  are  appropriate  for  this  simplistic  model  of  transport  and  reaction 
at  high  Knudsen  number.  In  this  work,  we  assume  that  the  deposition  rate  depend 
gle  species  concentration, the  features  are  spatially  isothermal  and  there  is  no 
rface  diffusion.  We  deal  with  deposition  in  trenches;  however,  the  same  methodo! 


ogy  applies  to  vias  as  well.  The  origin  of  the  coordinate  system  is  at  the  center  of 
the  trench  mouth,  and  moves  with  the  surface  during  the  deposition.  Figure  1  si 


such  a  trench,  and 


X(t)  is  the  film  thickness  in  the  bottom  of  the  trencli  at  time  / 


L(t)  is  the  film  thickness  at  the  mouth  of  the  trench  at  lime  /; 
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•  Zi  =■  H(l)  15  the  instantaneous  trench  depth; 

•  \V{Z,t)  is  the  instantaneous  width  of  the  trench  at  depth  Z , 

•  n'o  is  the  initial  width  of  a  trench, 

,  tf0  is  the  initial  trench  depth. 

Th,  ra,mt„tion  .1  a»V  «,»«!«  i  »  .  *  »  «*  fc“""  »  «"“"1  *» 

•vwrctM  i-^UM^o^l+wit^iW^o.™) 

Sr  L  J  (i) 

where  C-  is  the  instantaneous,  local  concentration  of  species  i.  A  and 
I  r^'ilahle  for  molecular  flow  and  the  ^ure^imete^ ^ 

respectively.  *<Z.O  -  **  f  L  at  time  /.  R, 

Z  and  time  ,  while  T(0  • <^tes  t  rcaclio„  Used  on  loca. 

is  the  instantaneous  rate  slo,ch,omcUic  coefficient  of  species  i  in  reaction 

conditions  and  t„  g  sccli onal-averaged,  local  Knudsen  diffusely  of  a 

i.  Di  is  the  instantaneous,  cross  sectional  6  following 

gaseous  species  ,  For  infinitely  long  rectangular  trenches.  Ref.  [3]  g.ves 

estimate  of  D-,'. 


Di(Z,t)  = 


SA'sHOl05  m 


(/)  18  +  ~o(^,f) 

T~  18+  16o(Z,f)  +  2cr(Z,0 


where  A'a  1=  Boltzmann's  constant,  T  is  the  absolute  temperature  m,  is  the  molecular 
of and  a  is  the  instantaneous,  local  aspect  ratio  given  by 


e(Z.  0  = 


IV(Z,() 


The  boundary  conditions  for  the  above  second  order  partial  differential  equation  are 


6(0,0  = 


p,(Q.O 

H,T(0 


j~\ 

where  R,  is  the  ideal  gas  constant. 

n,  lpcvd  .r  •<  f r 

a  single  gaseous  reactant  and  deposition  of  a  single  solid  species  b 
stoichiometry  and  rate  expression  we  use  for  this  process  are  [3,  .. 
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Dimensionless  Variable 
Axial  distance 

time _ 

feature  depth 

feature  cross  sectional  area 
feature  perimeter 
feature  width 

Knudsen  riifTusivily _ 

;  concentration 
rate  of  reaction 
solid  density 


[Definition 

t  =  z/7i(t)  "" 

r^lDno/IIj _ 

nr]  =  Hinnm _ 

,4(sc,r)  =  A(Zj)/A(QyQ) _ 

V{Z.t)=  P[ZJ)/PIQ.Q) _ 

W((tr)  =  \V(Zj)/\Y{Q%Q) 

V((tr)=  DjlZ.n/D^ti) _ 

0M,r)  =  C,(Zj)/Ct(  0,0) _ 

CM,  r)  =  RAP{ZJ),  Tm/RjW  o.  0),  7(0)) 

PSi  =  Cs^/Cro 


Table  1:  Definition  of  Dimensionless  Variables 


Si 0  *  +  gaseous  hy-products 
,  ....  (  195  \  05  (  mol 

(-jo= )" •  ter 


R(0.s){pi T)  =  cx!>  j  P°‘5> 


where,  the  temperature  T  and  partial  pressure  p  have  units  of  Kelvin  ancl  torr,  re* 
spectively.  The  activation  energy  is  in  kJ/muL 


For  the  purposes  of  identifying  the  important  parameters  that  dictate  the  step 
coverage  and  to  determine  their  dependence  on  (lie  CVD  chemistry  and  operating 
conditions,  the  model  equations  are  noudimensionalized  [i*3,  5],  using  the  definitions 
in  Table  1.  With  these  definitions,  the  model  equations  for  LPCVD  of  SiO^  from 
TEOS  in  long  rectangular  trenches  become  (3]: 


Species  balance  for  TEOS: 

do 


-  =  -i-!(z?w-) 

Sr  W-H’dC  >V 


4£(1  -  £) 

P] 


Boundary  conditions: 


0(0  ,r)  = 


p(0,/)T(0) 


'  ’  '  T(t)p( 0,0) 

"&H  =  -2W<(r,S<i^  <„ 

m.  2". 

where  G  is  the  dimensionless  rate  of  reaction,  <p  is  the  step  coverage  modulus  given 

by  2//»(o)/?(j,(o  j)jm 

K  ’  C(o,  01.0(0,  o)iy(o.o)  1  ' 
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jhmd  a0  is  initial  aspect  ratio,  given  by  o0  =  H0 /W0l  and  subscript  b  refers  to  the 
base  of  tbe  feature. 

- 

if,-'.  Wc  assume  that  at  any  particular  instant  in  time,  the  gas-phase  concentration 
^profiles  in  the  feature  are  the  steady-state  profiles  that  would  exist  for  the  reaction 
£  conditions  prevailing  at  that  instant  and  that  the  molar  density  of  the  gas  is  negligible 
^  in  comparison  to  that  of  the  solid  film,  i.c.,  0  <  p  [3],  Using  these  assumptions,  the 
13  species  balance  for  TEOS,  given  by  equation  (7),  simplifies  to: 


~V(i,  r)W((,  -  4r)G((,  r)H2(r)  =  0 


(11) 


We  discuss  the  simple  case  where  the  wafer  temperature  is  varied  while  maintaining 
££  L*IC  TEOS  partial  pressure  at  the  wafer  surface  constant.  The  boundary  conditions 


£gS5£  for  th e  above  second  order  differential  equation  are 


•jt: . 

SPV.i - 


f  and 


dOUk,r) 

dr 


2a< 


m 

nn 

(12) 

k{r)° ^ 

1  n  ’v(h,T) 

(13) 

...  The  reactant  concentration  profile  in  the  feature  at  any  instant  in  time  can  be 

2TT 


i'  determined  by  solving  the  above  differential  equation  and  is  given  by 


0({,r)  =  f?(O,r)- 


z>. 

P  = 


0{r)-H\r) 


S(&,r)  + 


W(6.r)C(6,r)  f( 


2a0-H(r) 


i: 


V(Z,t)W({ 


_ f <  g((,rW 

•-)  JoVU,T)W[f,T 


where 


£{fi  T)  =  /  G(Z*T)d£ 

Jo 

PROGRAMMED  RATE  CHEMICAL  VAPOR  DEPOSITION 


) 

(H) 

05), 


-.sag  The  objective  of  a  PRCVD  process  protocol  is  to  vary  the  deposition  rate  as  lire 
.isiST  deposition  progresses,  by  changing  one  or  more  of  the  process  parameters,  so  as  to 
f'?#  decrease  the  deposition  time,  subject  to  a  specific  final  step  coverage  constraint.  The 
PRCVD  process  path  employed  in  References  [3]  and  (5)  consists  of  two  distinct  legs: 
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1.  A  programmed  rate  leg  during  which  the  deposition  rate  at  the  wafer  surface  is 
adjusted  so  as  to  keep  the  instantaneous  differential  step  coverage  T  (see  below) 
constant, 

2.  A  constant  rate  leg  during  which  the  process  is  continued  at  a  constant  deposi¬ 
tion  rate. 


The  point  at  which  the  process  is  changed  from  tiie  programmed  rate  leg  to  the  con¬ 
stant  rate  leg,  is  referred  to  as  the  switching  point  {SP).  The  switching  point,  is 
defined  to  be  the  time  at  which  a  specified  percentage  of  feature  closure  is  obtained. 
Therefore,  a  PRCVD  process  path  is  determined  by  selecting  f  and  SP.  A  judi¬ 
cious  selection  of  P  and  SP  can  lead  to  a  higher  average  rate  of  deposition  during 
the  process  and,  hence,  to  significant  savings  in  processing  time  compared  with  the 
corresponding  CRCVD  process  yielding  the  same  step  coverage. 


VVc  demonstrate  the  use  of  this  PRCVD  approach  by  considering  the  case  of  SiOj 
deposition  from  TEOS  decomposition  in  a  rectangular  trench.  Since  the  reaction  rate 
at  the  wafer  surface  varies  during  PRCVD,  the  step  coverage  modulus  <i>  — which  is 
proportional  to  the  instantaneous  deposition  rate  at  the  wafer  surface,  sec  equation 
(10)  —  also  varies.  Thus,  the  problem  of  determining  a  rate  path  is  equivalent,  to 
finding  a  path  for  the  step  coverage  modulus  4>.  For  this  process,  <p  can  he  expressed 


as 


2  HjR,T0 

LoD,0P ,o 


ka  exp 


(lfi) 


The  above  equation  reveals  that  the  step  coverage  modulus  can  be  varied  by  changing 
either  the  partial  pressure  of  TEOS  at  the  wafer  surface,  or  the  wafer  temperature,  or 
both,  during  the  programmed  rate  leg.  In  this  work,  the  deposition  rale  is  adjusted 
by  changing  the  temperature,  since  this  has  a  large  effect  on  reaction  rate. 


The  instantaneous  differential  step  coverage  T  is  defined  by  r  - 
Because  L  =  R(p(Z,T),T(t))%  X  =  R(p(0,  T),T{t))  and  2A'(/)  +  W{Z,i)  =  W0%  see 
Figure  1,  wc  obtain 


r  = 


0(0,  r) 


(IT] 


Since  a  path  conforming  to  the  PRCVD  during  the  programmed  rate  leg  of  the  process 
corresponds  to  constant  f,  equation  (14)  results  in  the  following  expression  for  i(r). 


0(r) 


tno,r)(\  -  r2) 

fi.  J( _ fi.  | 

2o„W(t|  JO  tl((.r)W((,r)  JO  I 


(18) 
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Thus,  during  the  programmed  rate  leg  of  PRCVD.  the  temperature  is  determined 
from  equations  (IS)  and  (16),  using  a  first  order  approximation  described  in  Ref  0 
During  the  constant  rate  leg  (after  SP),  the  temperature  is  hold  constant  and  equal 
to  the  temperature  achieved  at  the  end  of  the  programmed  rate  leg.  A  reduction  of 
processing  Umc  using  PRCVD  versus  CRCVD  depend,  on  the  selected  values  of  the 
process  path  parameter,  F  and  SP.  For  a  given  step  coverage,  however,  P  and  SP 
may  be  found  by  trial  and  error  so  as  to  minimize  processing  time  (.3,  3|.  One  such 
method  is  to  first  fix  F  and  then  choose  the  value  of  SP  which  achieves  the  desired 
final  step  coverage.  A  minimum  lime  is  then  obtained  by  searching  over  the  range  of 
r  for  the  one  yielding  the  minimum  total  time.  Figure  2  shows  one  of  the  possible 
PRCVD  paths  for  SiO,  deposition  by  TEOS  decomposition.' 

OPTIMALLY  CONTROLLED  CHEMICAL  VAPOR  DEPOSITION 

Problem  Formulation 

_  °Ptlmal  control  theory  to  the  CVD  process,  we  describe  the  process 

•'teBjr-modd  "  1  SCt  °f°rdinary  difrerential  equations.  Since  the  step  coverage  is  evaluated 
-J the  ratl°  o{  Ule  th,ckncss  at  the  base  of  feature  to  the  thickness  at  the  wafer 
surface,  we  use  the  equations  which  describe  the  growth  rate  at  the  wafer  surface  and 
the  growth  rate  at  the  base  of  the  trench  as  the  process  model.  For  SiOj  deposition 
by  TitOS  deposition,  the  simplified  model  of  the  process  is  written  as 

HI  W„ 

~7T  =  r{l)d~ir  =  r(/);,s'77 exP  0  (20) 

|||§§|  where  equation  (20)  comes  from  the  definition  of  the  instantaneous  step  coverage  T. 


I  We  can  obtain  the  expression  for  T  by  using  the  first  order  perturbation  solution. 
,  The  first  order  perturbation  solution  of  the  pseudo-steady-state  mode!  provides  a 
,  reasonable  estimate  of  concentration  and  deposition  profiles  for  the  range  of  if>  values 
(reaction  rates)  which  yield  conformal  depositions,  since  the  concentration  does  not 
vary  by  more  than  a  few  percent  from  feature  mouth  to  feature  base  (3,5). 

Tiie  first  order  perturbation  solution  to  equation  (14)  for  tiie  dimensionless  con- 
centration  profile  in  a  trench  can  be  written  as 


earsSogn; 


0(1,  r)  s=  0(0,  r)  — 


cW(r)(p  +  I) 

V(Q,t)H0o0 
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whi-re  o  =  c,(Z^l).  Combining  equations  (M),  (IS)  and  (21),  we  obtain  the  following 
expression  for  7 


where 


,  -i  /jrniAi  i 

18+  Kio(Z.<)  +  2a»(Z.Q 


<?(q)  = 


i8  +  7o(Zj) 

Tlic  above  equations  reveal  that  7  depends  on  the  variables  T  and  n,  since  /?(/j(0,/), 
T(t))  depends  only  on  the  temperature  7(0  when  the  partial  pressure  />(0,/)  is  con¬ 
stant.  Thus,  the  optimal  control  can  be  cast  as  follows: 

For  the  system  model  equations  (10)  and  (20),  find  an  optimal  control  7  (tem¬ 
perature),  to  minimize  the  processing  time  //  subject  to  the  constraints 

A'(0)  =  L(0)  =  0  (21) 

X(tf)  ^  . 

JJlJj  "  a  Siven  st«p  coverage  (25) 

Wj)  =  a  given  thickness  (26) 

where  7  is  given  by  equation  (22). 

Method 


.Since  the  system  model  is  described  by  nonlinear  ordinary  differential  equations, 
we  do  not  attempt  to  solve  the  nonlinear  optimal  control  problem  analytically.  In¬ 
stead,  we  employ  numerical  techniques  to  determine  the  optimal  control  trajectories 
as  the  solution  to  a  two-point  boundary-value  problem.  One  such  general  method 
is  the  so-called  variation  of  extremals  [6].  However,  since  in  its  standard  form  this 
method  is  not  directly  applicable  to  solve  the  optimal  control  for  the  CVD  process, 
a  suitable  modification  needs  to  be  derived.  In  order  to  achieve  this,  we  begin  by 
employing  optimal  control  theory  to  determine  necessary  conditions  for  a  control  tra¬ 
jectory  to  be  optimal.  In  general,  the  problem  is  to  find  an  admissible  control  U"  in 
the  feasible  set  J?,  that  causes  the  svstcin 

X(t)  =  a(X(t),U(t),t)  (27) 

lo  follow  an  admissible  trajectory  A'*(0  that  minimizes  the  performance  index  ~ 

J{U)  =  h(X(tj),l,)+  [‘'  (2$) 

Jit, 

and  satisfies  boundary  conditions  X(t0)  =  A'0  and  X(tf)  =  Xs ,  where  *„  is  the 
specified  initial  time  and  t.j  is  the  unknown  final  time. 
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By  defining  the  Hamiltonian  (unction 


H(X(t),U[t),P{t),<)  =  g(X(t),U(t)j)  +  />T(Ol«(A'(0.f'(0.01  (29) 

the  necessary  conditions  for  optimality  can  he  written  as  follows  [G] 


ag?  A'*(o=  (/■(/).  p’(i)j)  mo) 

/j-(o  =  -§£(-no.  no.o  (3i) 

^§£r  W(A'-(/),tr(/),P-(0,0  -  min  W(A"(0, £/(/). no.')  (32) 

for  all  /  6  (l0, //],  with  boundary  conditions 

ligj  A”(<0)  =  A',  (13) 

Hg-.-  **(/,)  =  *,  (11) 

gL  W(A'-(//),f/*(/P),Va-(//),//)  +  ^(A'-(//),i/)  =  0  (35) 

...  in  our  problem,  we  are  interested  in  finding  an  optimal  temperature  trajectory  to 
TTjT’  minimize  the  processing  time.  Hence,  the  performance  index  is  the  processing  time 
vnS*'  while  the  control  is  the  temperature  T ,  he., 


J(U)  =  f‘ , 

Jto 


where  U  =  T.  Comparing  with  equation  (38),  we  obtain  the  following  equations 
/f(  A'(//),  //)  =  0  and  g[X '(/),  £/(f),  0  —  1  and  the  boundary  conditions 

A'-(f0)=A'o  (37) 

X'(tj)  =  X/  (38) 

K[XVf),U-{tj),nt/).h)  =  0  (39) 


t  :  Since  the  P*{tj)  is  not  specified  in  our  case,  we  cannot  directly  apply  the  pre- 

[  viously  mentioned  variation  of  extremals  algorithm.  In  order  to  develop  a  suitable 

i  modification,  we  consider  the  observed  final  state  X{t  j)  as  the  result  of  the  choice  of 

j  ffrJT  initial  costates  P(/.Q)  and  final  time  tj.  We  then  guess  P^(t0)  and  /J f°\  and  update 
*  them  so  that  the  observed  X(t/)  converges  to  the  desired  value.  Thus,  we  finally 

}  P3*  obtain  the  following  equations  for  updating  pt,+t) (/0 )  and  $y  +  ,\ 
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-Y'  -  •Y<0(//)  =  |^(</,./3ot,))(^,+”  -  /’o'0)  + 


a.v. 


- »«}  (40, 

dPdPQv>  ',onr'>  ~  r°  >  HD 


°  ~  S[A''  -  +  S|f-('/)./3o)(/3o('+,)  -  /f) 


nO+1) 


p" 


/<;+"  -  /<;> 


«(4V.)  o 


;V 


/  -  -V"'(D) 


H>, 


lr[A'/  -  *(<<•>,! 

The  iteration  is  terminated  when  \\X}  -  A'(/';|)||  <  7  is  satisfied. 

I.  sl'ould  be  emphasized  that  the  above  procedure  for  computing  the  optimal 
trajectory  does  not  account  for  the  possibility  that,  during  the  iterations,  the  in- 
Abb  a"T>US  Ste|>  coveraSe  r  may  become  a  complex  number,  es|)eciallv  near  closure 

a  ,  jjv  a  7  ,C  Way  t0  prevcnt  this  situati°"  is  feasible  (e.g.,  bv  introducing 

additional  constraint  m  the  optimal  control  problem),  for  the  sake  of  simplicity 

we  adopted  a  different  approach.  We  use  a  strategy  similar  to  that  used  for  PRCVD 
...  that  we  only  find  the  optimal  control  during  the  first  leg  so  as  to  minimize  the 
corresponding  time,  while  for  the  second  leg  the  processing  time  is  fixed  and  the 
temperature  \ s  held  constant. 


RESULTS  AND  DISCUSSION 

To  compare  the  results  of  CRCVD,  PRCVD  and  OCCVD,  let  us  consider  the  model 
f  S.Oj  deposition  by  TEOS  decomposition,  as  described  by  Equations  (19-00)  The 

PRCVDSteP  (Ti  considered  here  is  SC  =  0.96.  For  best  results  in  the 

PRCVD  case,  Reference  [5]  suggests  the  following  choice  of  parameters:  C=  0  9775 

during  the  programmed  rate  leg  and  SP  =  0.899.  WiLh  these  values  and  as  shown 

co„sumethereaLremnerf?r?  Until  the  SwkchinS  '3oillt  SP  remains 

.  ant  thereafter.  The  total  deposition  time  achieved  using  PRCVD,  for  98  percent 

closure,  is .388  seconds.  For  the  same  5C  and  percent  closure,  CRCVD  re  uir  " 

econds  of  deposition  me  The  PRCVD  process  achieves  47%  savings  in  processing 

tune,  compared  with  the  CRCVD  process  which  provides  the  same  step  coverage.  ~ 

For  OCCVD,  we  first  select  the  time  of  the  second  leg  as  100  seconds.  This 
alue  is  somewhat  arbitrary  and  could  be  selected  more  systematically  by  performing 
a  one-parameter  optimization  with  respect  to  the  second  leg  time  However,  f£ 
sake  of  simplicity,  this  approach  is  not  ado, bed  in  (he  present  study.  With  the 
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gggg^perature  held  constant  in  this  leg.  equal  to  the  final  temperature  of  the  firs,  lee  i„ 
tsSsSSSSsthe  previous  iteration,  equations  (19-20)  are  inteerated  |,Jt.  i  •  .  es  m 

ZZSgSZl he  final  state  X(tj)  at  the  end  of  the  firs  lee  Th  fi  ,  l,me  l°  <**« 

SgaSreSF.  ,  ,ox  .  n  .  \  .  ,  01  tr,°  hrsl  ,eS-  This  final  state  is  then  substituted 

«»>'»"  («  10  pit  *.  «x.  e,,.,  ,„d ,  T1,„,  ' 

fi’  mi|»led  from  'W,„n  m  The  p,„„H„rc  is  r,  V. 

^*rrfr*  '•  ;r «<  “« occvo  .PP„,.d,  ,re  ,,J„  P " 
8m™  7*^.  10  ‘l,c  PncVD  *"«“*•  oeevo  yields  l m 

l§S£p X  -T  ””  """5’  “  °»  -Hr*  **  CRCVD 

-  212:  a-r/x*X”  t  -■£ 

-.■■av.  fi.3%  for  PRCVD,  and  4.6%  for  CRCVD  These  res,, I, ,T,  V" ,  f°r  °CCVD’ 
^predicted  using  the  simplified  model 

-  accurate  process  model  is  used.  The  above  results  demonstrate  that  significant™ 

-._^;nSS  Ptr0Cef,nfi  ,me  Can  ^  obtained,  by  using  optimal  control  theory  to  colme 
V-Tf?r  temperature  trajectories  for  a  CVD  process.  '  compute 

,."2l  ,PraCtiCe’  t',e;imiC  va?in«  temperature  predicted  by  OCCVD.  will  also  affect 
Jl:.  'f  PreSSU;KeS\  f  deSCribed  ^f.  (7),  reactor  scale  simulation  could  be  u^ 
"Ss!:  to  determine  the  interaction  between  temperatures  and  partial  pressures,  and  to  es- 
— 3=-=-*  tabhsh  reactor  set* point  trajectories.  In  contrast  to  thr  PRPvn  , 

-5=  -r*  t  *» 

"  ““  “  ““P1*.  “  »U~l«n  constrains  0„  lP,  f' 

—--  temperature).  These  issues  are  left  as  topics  of  future  research. 
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Abstract 

In  this  paper,  Optimal  Control  theory  is  applied  to 
develop  an  alternative  process  protocol  in  single  wafer 
reactor  LPCVD  on  patterned  wafer  in  an  effort  to 
minimize  the  processing  time,  for  given  final  step  cov¬ 
erage.  To  achieve  this,  the  operating  conditions  are 
changed  during  the  deposition  in  a  prescribed  man¬ 
ner.  A  simplified  control  model  is  developed  from  the 
simultaneous  one-dimensional  Knudsen  diffusion  and 
chemical  reaction  description.  The  optimal  control 
problem  is  formulated  to  find  a  temperature  trajec¬ 
tory  yielding  the  minimum  processing  time  and  its 
solution  is  computed  numerically  via  a  modified  vari¬ 
ation  of  extremals  method.  To  demonstrate  the  con¬ 
cept  of  optimal  control  CVD  (OCCVD),  we  consider 
the  thermally  activated  deposition  of  silicon  dioxide 
(Si02)  from  tetraethylorthosilicate  (T EOS).  Using 
the  simplified  control  model,  the  estimated  process 
time  to  achieve  a  96%  step  coverage  at  98%  closure 
with  the  constant  rate  CVD  (CRCVD)  strategy  is 
729  seconds.  Under  the  same  conditions,  the  optimal 
control  CVD  (OCCVD)  process  time  is  278  seconds. 
Compared  to  CRCVD,  the  process  time  saved  with 
OCCVD  is  62%. 

1  Introduction 

With  the  increasing  demand  for  larger  wafer  diame¬ 
ters,  single  wafer  reactors  are  preferred  over  volume- 
loaded  multiple- wafer  reactors,  since  they  offer  bet¬ 
ter  deposition  uniformity.  Step  coverage  and  film 
thickness  uniformity  are  critical  constraints  in  low 
pressure  chemical  vapor  deposition  (LPCVD)  of  pat¬ 
terned  wafers.  In  order  to  maintain  high  device 
throughputs  in  single  wafer  reactors,  the  deposition 

-This  work  was  supported  by  ARPA  under  grant  F49620- 
93-1-0062. 


rate  must  be  much  higher  compared  to  multiple  wafer 
reactors.  An  important  issue  arising  here  is  that  in 
general  the  step  coverage  deteriorates  with  increas¬ 
ing  deposition  rate.  Thus,  maintaining  high  step  cov¬ 
erage  under  high  deposition  rate  is  a  big  challenge 
in  designing  rapid  thermal  LPCVD  processes.  This 
problem  is  difficult  to  solve  using  conventional  con¬ 
stant  rate  CVD  (CRCVD),  in  which  deposition  con¬ 
ditions  are  held  constant  during  the  majority  of  the 
deposition  process.  After  an  initial  transient,  the  de¬ 
position  rate  remains  essentially  constant  at  each  lo¬ 
cation  on  the  wafer  surface  during  the  process.  Dur¬ 
ing  deposition,  feature  aspect  ratios  increase,  and  the 
deposition  rate  during  CRCVD  must  be  low  enough 
to  ensure  good  step  coverage  as  features  close,  where 
the  aspect  ratios  are  the  highest.  Cale  and  cowork¬ 
ers  have  suggested  programmed  rate  CVD  (PRC'VD) 
for  SWR  LPCVD  processes,  in  which  the  operating 
conditions  are  changed  during  the  deposition  in  a  pre- 
scribed  manner  [1-3].  The  deposition  conditions  are 
changed  in  PRCVD  such  that  the  deposition  rate  de¬ 
creases  during  processing,  as  the  feature  aspect  ratios 
increase.  PRCVD  can  be  used  to  decrease  the  depo¬ 
sition  time  for  a  given  final  step  coverage,  thereby 
increasing  throughput. 

In  this  paper,  we  employ  optimal  control  theory 
to  develop  a  new  approach,  referred  to  as  optimally 
controlled  chemical  vapor  deposition  (OCCVD)  for 
the  specific  problem  of  maximizing  throughput  for  a 
specified  step  coverage.  For  the  purpose  of  developing 
such  a  control  strategy,  we  employ  an  approximate 
model  based  on  a  one-dimensional  Knudsen  diffusion 
and  chemical  reaction  description  [1-3] -  The  time 
savings  achieved  by  OCC^  D  depend  upon  the  cnem- 
istrv  and  other  process  parameters,  including  the  step 
coverage  constraint.  In  this  work,  the  temperature 
is  changed  during  deposition.  More  generally,  reac¬ 
tant  flow  rates  and  other  process  set-points  could  be 


233 


0-7803-2928-7/9 5/S3.CO  ©1995  IEEE 


1995  international  Symposium  on  Semiconductor  Manufacturing 


WfO  0 


Figure  1.  Cross  section  of  a  feature  during  deposition 

changed.  Our  simulation  results  show  that  the  tem¬ 
perature  trajectory  obtained  by  solving  the  associ¬ 
ated  optimal  control  problem  can  yield  even  further 
reduction  of  the  processing  time  than  the  PRCVD 
approach.  Step  coverages  obtained  by  the  OCCVD 
process  were  computed  using  the  simulation  package 
EVOLVE  [-1]  in  order  to  ensure  that  errors  introduced 
through  the  use  of  the  approximate  model  are  rela¬ 
tively  small. 


2  Modeling 

For  the  purpose  of  developing  control  strategies,  we 
treat  transport  as  a  one  dimensional  process  in  which 
species  fluxes  are  expressed  in  terms  of  local  con¬ 
centration  gradients  and  Knudsen  diffusion  coeffi¬ 
cients.  Cale  and  coworkers  [1-3]  presented  the  equa¬ 
tions  which  are  appropriate  for  this  simplistic  model 
of  transport  and  reaction  at  high  Knudsen  number. 
In  this  work,  we  assume  that  the  deposition  race  de¬ 
pends  on  a  single  species  concentration, the  features 
are  spatially  isothermal  and  there  is  no  surface  diffu¬ 
sion.  We  deal  with  deposition  in  trenches;  however, 
the  same  methodology  applies  to  vias  as  well.  The 
origin  of  the  coordinate  system  is  at  the  center  of 
the  trench  mouth,  and  moves  with  the  surface  during 
the  deposition.  Figure  1  shows  such  a  trench,  and 
A  (f)  is  the  film  thickness  in  the  bottom  of  the  trench 
at  time  t;  L{t)  is  the  film  thickness  at  the  mouth  of 
the  trench  at  time  t\  Z\>  =  H(t)  is  the  instantaneous 
trench  depth;  W{Z,t)  is  the  instantaneous  width  of 
the  trench  at  depth  Z\  WQ  is  the  initial  width  of  a 
trench;  and  Hq  is  the  initial  trench  depth. 

The  concentration  of  any  species  i  at  a  depth  Z  in 
the  feature  is  governed  bv 
tM (Z.t)C,{Z,t))  =  £  [A(Z.t)D,(Z,t)^^l]  + 

r 

i),T(t))  (1) 

2=  1 

where  C<  is  the  instantaneous,  local  concentration 
of  species  i.  A  and  P  denote  the  area  available  for 


Table  1:  Definition  of  Dimensionless  Variables 


molecular  flow  and  the  feature  perimeter  at  position 
Z  and  time  t,  respectively.  Pi(Z,t)  is  the  instan¬ 
taneous  partial  pressure  of  species  i  at  position  Z 
and  time  t  while  T(t )  denotes  the  temperature  at  ne 
wafer  surface  at  time  t.  R,  is  the  instantaneous  rate 
of  the  j-th  heterogeneous  chemical  reaction  based  on 
local  conditions  and  u;-,-  is  the  generalized  stoichio¬ 
metric  coefficient  of  species  i  in  reaction  j.  D;  is  the 
instantaneous,  cross  sectional-averaged,  local  Knud¬ 
sen  dirtusivity  of  a  gaseous  species  i.  For  infinitely 
long  rectangular  trenches,  Ref.  [3]  gives  the  following 
an  estimate  of  The  boundary  conditions  for  the 
above  second  order  partial  differential  equation  are 


D,(Zblt)  — - =}_jvnRj{pi(Zi,t),T{t))  (3) 

;=i 

where  is  the  ideal  gas  constant. 

The  LPCVD  process  of  deposition  of  silicon  dioxide 
by  TEOS  pyrolysis  involves  a  single  gaseous  reactant 
and  deposition  of  a  single  solid  species  Si02.  The 
reaction  stoichiometry  and  rate  expression  we  use  for 
this  process  are  [3,  5]: 


(£i)  w 

where,  the  temperature  T  and  partial  pressure  p  have 
units  of  Kelvin  and  torr,  respectively.  The  activation 
energy  is  in  kj /mol. 


For  the  purposes  of  identifying  the  important  pa¬ 
rameters  that  dictate  the  step  coverage  and  to  de¬ 
termine  their  dependence  on  the  CVD  chemistry  and 
operating  conditions,  the  model  equations  are  nondi- 
mensionalized  [1-3,  5],  using  the  definitions  in  Ta¬ 
ble  1.  With  these  definitions,  the  model  equations 
for  LPCVD  of  Si02  from  TEOS  in  long  rectangular 
trenches  become  [3]: 

Species  balance  for  TEOS: 
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(5) 


1  a  3$ ,  oG  J 

wh*  i?lI'vva?  ~  T7(1  -  7 


Boundary  conditions: 


J(0,)  =  «  (6, 

'  ’  rc)p(o.o) 

-)  _  «(-)  ,  ,  Gib.  -) 

dz  2o0  '  ]mh,-) 

where  G  is  the  dimensionless  rate  of  reaction,  o  is 
the  step  coverage  modulus  given  by 


_  2H2(0)R(P(0.t),T(t)) 

9{,)  CtO.Oji^lO,  0)VI/(0.0) 

and  ao  is  initial  aspect  ratio,  given  by  ao  =  Hq/W0, 
and  subscript  b  refers  to  the  base  of  the  feature. 

We  assume  that  at  any  particular  instant  in  time, 
the  gas-phase  concentration  profiles  in  the  feature  are 
the  steady-state  profiles  that  would  exist  for  the  reac¬ 
tion  conditions  prevailing  at  that  instant  and  that  the 
molar  density  of  the  gas  is  negligible  in  comparison 
to  that  of  the  solid  film,  i.e.,  0  <  p  [3].  Using  these 
assumptions,  the  species  balance  for  TEOS,  given  by 
equation  (7),  simplifies  to: 


4-VU,  r)W(f,  r) -  o (t)G(C,  r)W2 (r)  =  0  (9) 

We  discuss  the  simple  case  where  the  wafer  temper¬ 
ature  is  varied  while  maintaining  the  TEOS  partial 
pressure  at  the  wafer  surface  constant.  The  bound¬ 
ary  conditions  for  the  above  second  order  differential 


of  two  distinct  legs:  1.  A  programmed  rate  leg  dur¬ 
ing  which  the  deposition  rate  at  the  wafer  surface  is 
adjusted  so  as  to  keep  the  instantaneous  differential 
step  coverage  r  (see  below)  constant.  2.  A  constant 
rate  leg  during  which  the  process  is  continued  at  a 
constant  deposition  rate. 

The  point  at  which  the  process  is  changed  from  the 
programmed  rate  leg  to  the  constant  rate  leg,  is  re¬ 
ferred  to  as  the  switching  point  (SP).  The  switching 
point  is  defined  to  be  the  time  at  which  a  specified 
percentage  of  feature  closure  is  obtained.  Therefore, 
a  PRCVD  process  path  is  determined  by  selecting  F 
and  SP.  A  judicious  selection  of  T  and  SP  can  lead 
to  a  higher  average  rate  of  deposition  during  the  pro¬ 
cess  and,  hence,  to  significant  savings  in  processing 
time  compared  with  the  corresponding  CRCVD  pro¬ 
cess  yielding  the  same  step  coverage. 

We  demonstrate  the  use  of  this  PRCVD  approach 
by  considering  the  case  of  SiOo  deposition  from  TEOS 
decomposition  in  a  rectangular  trench.  Since  the  re¬ 
action  rate  at  the  wafer  surface  varies  during  PRCVD. 
the  step  coverage  modulus  o  — which  is  proportional 
to  the  instantaneous  deposition  rate  at  the  wafer  sur¬ 
face,  see  equation  (8)  —  also  varies.  Thus,  the  prob¬ 
lem  of  determining  a  rate  path  is  equivalent  to  finding 
a  path  for  the  step  coverage  modulus  o.  For  this  pro¬ 
cess,  o  can  be  expressed  as 


<p(r) 


2HqR9Tq 

L0Dt0Pio  K°  eXP 


(13) 


equation  are 


dr  2or0  Tqf&.r) 

The  reactant  concentration  profile  in  the  feature  at 
any  instant  in  time  can  be  determined  by  solving  the 
above  differential  equation  and  is  given  by 


#(£,  r)  =  0(0,  t)  -  4>{r)ri2(r)  [</($&,  i 


Wf  ft,, ricn».r) 


i.  r)  W(  j,r  J 


JO  X>(4.rjW(jfr; 


The  above  equation  reveals  that  the  step  coverage 
modulus  can  be  varied  by  changing  either  the  partial 
pressure  of  TEOS  at  the  wafer  surface,  or  the  wafer 
temperature,  or  both,  during  the  programmed  rate 
leg.  In  this  work,  the  deposition  rate  is  adjusted  by 
changing  the  temperature,  since  this  has  a  large  effect 
on  reaction  rate. 

The  instantaneous  differential  step  coverage  F 
is  defined  by  F  =  dX^  Because  L  = 

R(p(Z,T),T(0),  -V  =  if(p(0,T),T(t))  and  2X{t)  -r 
W{Z,t)  =  W0}  see  Figure  1,  we  obtain 


] 

r=p,r)r 

(14 

(12) 

l  0(0.  r)  J 

where  £(£,  r)  =  /q  G{s,  r)ds 

3  Programmed  Rate  Chemical 
Vapor  Deposition  (PRCVD) 

The  objective  of  a  PRCVD  process  protocol  is  to  vary 
the  deposition  rate  as  the  deposition  progresses,  by 
changing  one  or  more  of  the  process  parameters,  so 
as  to  decrease  the  deposition  time,  subject  to  a  spe¬ 
cific  final  step  coverage  constraint.  The  PRCTVD  pro¬ 
cess  path  employed  in  References  [3]  and  [5]  consists 


Since  a  path  conforming  to  the  PRCVD  during  the 
programmed  rate  leg  of  the  process  corresponds  to 
constant  F }  equation  (12)  results  in  an  expression  for 
o(r). 

During  the  programmed  rate  leg  of  PRCVD,  the 
temperature  is  determined  from  equations  (12).  (13) 
and  (14),  using  a  first  order  approximation  described 
in  Ref.  [5].  During  the  constant  rate  leg  (after 
SP),  the  temperature  is  held  constant  and  equal 
to  the  temperature  achieved  at  the  end  of  the  pro¬ 
grammed  rate  leg.  A  reduction  of  processing  time  us¬ 
ing  PRCVD  versus  CRCVD  depends  on  the  selected 
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values  of  the  process  path  parameters  r  and  SP.  For 
a  given  step  coverage,  however,  F  and  SP  may  be 
found  by  trial  and  error  so  as  to  minimize  processing 
time  (3,  5].  One  such  method  is  to  first  fix  f  and  then 
choose  the  value  of  SP  which  achieves  the  desired  fi¬ 
nal  step  coverage.  A  minimum  time  is  then  obtained 
by  searching  over  the  range  of  f  for  the  one  yielding 
the  minimum  total  time.  Figure  2  shows  one  of  the 
possible  PRC  YD  paths  for  Si02  deposition  by  TEOS 
decomposition. 


4  Optimally  Controlled  Chem¬ 
ical  Vapor  Deposition  (OC- 
CVD) 

Problem  Formulation 

To  apply  optimal  control  theory  to  the  CVD  pro¬ 
cess,  we  describe  the  process  model  as  a  set  of  ordi¬ 
nary  differential  equations.  Since  the  step  coverage  is 
evaluated  as  the  ratio  of  the  thickness  at  the  base  of 
feature  to  the  thickness  at  the  wafer  surface,  we  use 
the  equations  which  describe  the  growth  rate  at  the 
wafer  surface  and  the  growth  rate  at  the  base  of  the 
trench  as  the  process  model.  For  SiOo  deposition  by 
TEOS  deposition,  the  simplified  model  of  the  process 
is  written  as 


dUt) 


dt 


=  R  =  38.77  exp  (  - 


irn,mirn 


where  equation  (16)  comes  from  the 
instantaneous  step  coverage  I\ 


?°-5(0,i)  (15) 

(16) 

definition  of  the 


We  can  obtain  the  expression  for  T  by  using  the 
first  order  perturbation  solution.  The  first  order  per¬ 
turbation  solution  of  the  pseudosteady-state  model 
provides  a  reasonable  estimate  of  concentration  and 
deposition  profiles  for  the  range  of  6  values  (reac¬ 
tion  rates)  which  yield  conformal  depositions,  since 
the  concentration  does  not  vary  by  more  than  a  few 


percent  from  feature  mouth  to  feature  base  [3,  5], 
Tne  first  order  perturbation  solution  to  equation 
(12)  for  the  dimensionless  concentration  profile  in  a 
trench  can  be  written  as 


=  l6cr(Z.0  +  2aJ(Z.;) 

13-  7o(Z,:) 


(19) 


The  above  equations  reveal  that  r  depends  on  the 
variables  T  and  a,  since  R(p(0,t),  T(t))  depends  onlv 
on  the  temperature  T(t)  when  the  partial  pressure 
p(0,£)  is  constant.  Thus,  the  optimal  control  can  be 
cast  as  follows: 

For  the  system  model  equations  (15)  and  (16),  find 
an  optimal  control  T  (temperature),  to  minimize  the 
processing  time  '/  subject  to  the  constraints 


X  (0)  =  £,(0)  =  0  (20) 

Xl:f)  ^ 

*777  y  >  ^  £iven  step  coverage  (21) 

L(t/)  =  a  given  thickness  (22) 

where  f  is  given  by  equation  (18). 

Method 

Since  the  system  model  is  described  by  nonlinear 
ordinary  dinerentiai  equations,  we  do  not  attempt  to 
solve  the  nonlinear  optimal  control  problem  analyt¬ 
ically.  Instead,  we  employ  numerical  techniques  to 
determine  the  optimal  control  trajectories  as  the  so¬ 
lution  to  a  two-point  boundary-value  problem.  One 
such  general  method  is  the  so-called  variation  of  ex¬ 
tremals  [6].  However,  since  in  its  standard  form  this 
method  is  not  directly  applicaole  to  solve  the  optimal 
control  for  the  CVD  process,  a  suitable  modification 
needs  to  be  derived.  In  order  to  achieve  this,  we  be¬ 
gin  by  employing  optimal  control  theory  to  determine 
necessary  conditions  for  a  control  trajectory  to  be  op¬ 
timal.  In  general,  the  problem  is  to  find  an  admissible 
control  U  in  the  feasible  set  i?,  that  causes  the  sys¬ 
tem 


=  (23) 

to  follow  an  admissible  trajectory  X *(t)  that  mini¬ 
mizes  the  performance  index 

J(U)  =  h(X(tf),  tf)  +  f  '  g(X(t ),  U(t),  t)dt  (24) 
J  to 

and  satisfies  boundary  conditions  A'(t0)  =  X0  and 
X(tf)  =  X/j  where  t0  is  the  specified  initial  time  and 
if  is  the  unknown  final  time. 

By  defining  the  Hamiltonian  function 


«(lr)  =  «(0,r)- 


o7i(r)(g  -f-  i) 

V{Otr)HQa0 


(17) 


where  a  =  Combining  equations  (13),  (14) 

and  (IT),  we  obtain  the  following  expression  for  P 


no  = 


where 


-±Y 

To  \SKbJ 


(or  -f  l)v(or) 


K(p(0.  0,7(0) 


i  -v 


(IS) 


H(X(t),  £7(0;  P{i),  0  =  3(X(t):  £7(0.  *)+Pr  (i)[*(X(t),  U(t),:)] 

u  <25> 

the  necessary  conditions  for  optimality  can  be  writ¬ 

ten  as  follows  (61 

Xm«)  =  jp(X-(i),ir(t);P'[t)tt)  (26) 

P’{‘)  =  (27) 

=  min  ri(X'{t),U(t),P’(t).i) 

(23) 
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for  all  t  £  [to,t/],  with  boundary  conditions 

X'(t0)  =  A'0  (29) 

X'{tf)  =  X,  (30) 

H(X’(t/),U-{tr),P’(tJ),t/)  +  —{X‘(tj),t,)  =  0  (31) 

In  our  problem,  we  are  interested  in  finding  an  opti¬ 
mal  temperature  trajectory  to  minimize  the  process¬ 
ing  time.  Hence,  the  performance  index  is  the  pro¬ 
cessing  time  while  the  control  is  the  temperature  T, 

i.e., 


J(U) 


=  P 


it 


(3*2) 


where  U  =  T.  Comparing  with  equation  (38),  we 
obtain  the  following  equations  h(X(tf),t/)  =  0  and 
g(X(t),U(t),t)  =  1  and  the  boundary  conditions 

X’[ta)  =  X0  (33) 

X'(tj)  =  X,  (34) 

K{X‘(tf),tT(t}),P-(tf),tj)  =  0  (35) 

Since  the  is  not  specified  in  our  case,  we  can¬ 

not  directly  apply  the  previously  mentioned  variation 
of  extremals  algorithm.  In  order  to  develop  a  suit¬ 
able  modification,. we  consider  the  observed  final  state 
Xltj)  as  the  result  of  the  choice  of  initial  costates 
p{t0)  and  final  time  t}.  We  then  guess  P(0)(<o)  and 
t(0),  and  update  them  so  that  the  observed  X(t /)  con¬ 
verges  to  the  desired  value.  Thus,  we  finally  obtain 
the°following  equations  for  updating  P(,+1)(*o)  and 


-  =  §%{p\p")(Pp+1)  -  P(o'])+ 


dX 

dtf 


{t?,PP)(t(fi+1)-tlp) 


(36) 


dJL[Xl-x^p)\ +d^^-y;\mpp+i)-p^)  (3-) 


dPa 


The  iteration  is  terminated  when  \\Xf  —  X(t^)||  <  7 
is  satisfied. 

It  should  be  emphasized  that  the  above  procedure 
for  computing  the  optimal  trajectory  does  not  ac¬ 
count  for  the  possibility  that,  during  the  iterations, 
the  instantaneous  step  coverage  P  may  become  a 
complex  number,  especially  near  closure.  Although 
a  systematic  way  to  prevent  this  situation  is  feasible 
(e.g.,  by  introducing  an  additional  constraint  in  the 
optimal  control  problem),  for  the  sake  of  simplicity 
we  adopted  a  different  approach.  We  use  a  strategy 
similar  to  that  used  for  PRCVD  in  that  we  only  find 
the  optimal  control  during  the  first  leg  so  as  to  mini¬ 
mize  the  corresponding  time,  while  for  the  second  leg 
the  processing  time  is  fixed  and  the  temperature  is 


held  constant. 


CRCVD.  ?SCVD  A  OCCVD  Tcmprriarc  TnK-cwn« 


Figure  2:  Temperature  trajectory  comparison  under 
CRCVD,  PRCVD  and  OCCVD 

5  Simulation  Results  and  Dis¬ 
cussion 

To  compare  the  results  of  CRCVD,  PRCVD  and  OC¬ 
CVD,  let  us  consider  the  model  of  SiOn  deposition  by 
TEOS  decomposition,  as  described  by  Equations  (15- 
16).  The  desired  step  coverage  (SC)  considered  here 
is  SC  =  0.96.  For  best  results  in  the  PRCVD  case. 
Reference  [5]  suggests  the  following  choice  of  param¬ 
eters:  r=  0.9775  during  the  programmed  rate  leg 
and  SP  =  0.899.  With  these  values  and  as  shown  in 
Figure  2,  the  temperature  decreases  until  the  switch¬ 
ing  point  SP  and' remains  constant  thereafter.  The 
total  deposition  time  achieved  using  PRCVD,  for  98 
percent  closure,  is  386  seconds.  For  the  same  SC 
and  percent  closure.  CRCVD  requires  729  seconds^! 
deposition  time.  The  PRCVD  process  achieves  47 /o 
savings  in  processing  time,  compared  with  the  CR¬ 
CVD  process  which  provides  the  same  step  coverage. 

For  OCCVD,  we  first  select  the  time  of  the  sec¬ 
ond  leg  as  100  seconds.  This  value  is  somewhat  ar¬ 
bitrary  and  could  be  selected  more  systematically  by 
performing  an  one-parameter  optimization  with  re¬ 
spect  to  the  second  leg  time.  However,  for  the  sake  or 
simplicity,  this  approach  is  not  adopted  in  the  present 
study.  With  the  temperature  held  constant  in  this 
leg,  equal  to  the  final  temperature  of  the  first  leg 
in  the  previous  iteration,  equations  (la-16)  are  in¬ 
tegrated  backwards  in  time  to  obtain  the  final  state 
X[tr)  at  the  end  of  the  first  leg.  This  final  state  is 
then  substituted  in  equation  (37)  to  yield  the  next 
estimate  for  P{t0)  and  tf.  These  values  are  in  turn 
used  to  integrate  the  state  equations  (26—2 r )  forward 
in  time  with  the  input  U  (temperature)  computed 
from  equation  (28).  The  procedure  is  repeated  until 
convergence  is  achieved.  The  results  of  the  OC  ^ 
approach  are  shown  in  Figure  3.  The  first  leg  time  is 
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Figure  3:  Simulated  final  trench  deposition  profile  un¬ 
der  the  OCCVD  temperature  trajectory.  (PRCVD 
and  CRCVD  final  profiles  are  similar),  (a)  solid- 
EVOLVE  simulation,  (b)  dashed:  simplified  control 
model,  used  in  the  OCCVD  computations. 


178 ^seconds  and  the  total  time,  for  98  percent  closure, 
is  278  seconds.  Compared  to  the  PRCVD  approach’ 
OCCVD  yields  a  28%  reduction  of  deposition  time’ 
The  deposition  time  savings  is  62%  when  compared 
with  CRCVD  (see  Figure  2). 


These  estimated  process  time  savings  are  validated 
by  simulating  the  OCCVD,  PRCVD  and  CRCVD 
process  paths  using  the  EVOLVE  software  [4],  The 
simulation  results  (Figure  3)  show  that  the  step  cover¬ 
age  predicted  using  the  simplified  model  differs  from 
the  more  accurate  EVOLVE  predictions  by  2.2%  for 
OCCVD,  6.8%  for  PRCVD,  and  4.6%  for  CRCVD. 
These  results  indicate  that  the  time  savings  predicted 
using  the  simplified  model  remains  Qualitatively  the 
same  when  a  more  accurate  process*  model  is  used 
The  above  results  demonstrate  that  significant  sav¬ 
ings  in  processing  time  can  be  obtained,  by  using  op¬ 
timal  control  theory  to  compute  temperature  trajec- 
tories  for  a  CVD  process. 


In  practice,  the  time  varying  temperature  predicted 
by  OCCVD,  will  also  affect  partial  pressures.  As  de¬ 
scribed  in  Ref.  [7],  reactor  scale  simulation  could  be 
used  to  determine  the  interaction  between  tempera¬ 
tures  and  partial  pressures,  and  to  establish  reactor 
set-point  trajectories.  In  contrast  to  the  PRCVD  aD- 
proach,  optimal  control  theory  can  easily  account  for 
the  constraints  imposed  by  the  temperature  dynamics 
in  the  solution  (for  example,  to  introduce  constrains 
on  the  derivative  of  the  temperature).  These  issues 
are  leit  as  topics  of  future  work. 
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Abstract 

CVD)  protocol  is  developed  by  employing  an  ann  T  "  CODtroIled  c^mica I  vapor  deposition  fOC- 

one-dimensional  Knudsen  diffusion  and  chemical  reZZn  LTipticnTlPCVD 

maximizmg  throughput  for  a  specified  step  coverage  The  10QofLPCVP<  for  the  specific  problem  of 
-a  gained  by  a,  adaptive  alia.  „/ 2 T^l  “* 

sca/e  process  simulations  are  used  to  verify  that  the  trai  t  '  r  refctor  scaIe  model.  Rigorous  feature 


1  Introduction 

In  low  pressure  chemical  vapor  deposition  iLPCVDi  nm 

deposited  react  on  the  surface  to  form  III™.  Single  wir  Lo^^^to"^"8  ’h'  ^  be 
competitive,  relative  to  multi-wafer  LPCVD  reactors  ri„  t  essure  LPCVD  reactors  are  becoming  more 
uniformity.  A  schematic  diagram  of  a  single  wafer  LPCW)  °  ™P'OV""ents  ”  deposit.on 

device  throughput  in  single  wafer  reactors  at  leTds  wh  h  .“. ,11USlrated  ta  Fi*  >•  To  maintain 

deposition  rate  must  be  much  hfoher  In  LPCVD  o  ,!  *7  COmpetlt,ve  w,th  “ultiple  wafer  reactors,  the 

^17^  ^  COVera®e  deteriorates  with  incp^jriTT^toim^atesT'buTm^t'3  *77  conslra*nt*  In 
while  increasing  the  deposition  rate  is  a  big  challenge  fo  the  desivn  of  LPCTO  ^ 

rapid  thermal  LPCVD.  In  conventional  constant  „te  CVD  (CRC™  whZ  77"’-  ^  ” 

held  constant  during  the  majority  of  the  deposition  process  Betw  ’7  depos.t.on  conditions  are 

the  deposition  rate  remains  essentially  constant  at  each  I  f  7  "Up  “d  shuW»»u  transient, 

It  is  well  known  that  the  step  coverage  decreases  as  f  .  ^  °b  ^  Wafer  surface  during  the  process, 

during  CRCVD  must  be  I  Jenough  fo  eZT7 1“  7  “P“‘  »«  «>»  deposition  rate 

“  “  step  coverage  obtained  over  foe TJurse  rf  ‘ZT  ““  “'p 

feature  close  and  the  aspect  ratio,  are  highest  III  Cal  d  ^position,  11  de«rade5  rapidly  as 

ID.  Calc  and  eoworkers  have  suggested  programmed  rate 

by  SRC  and  RSF  under  jr.at.  IWr,  .ad  eTsXuln  0"8'1'0062'  Tl”  °f  EVOLVE  suppurted 
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CVD  (PRCVD)  for  single  wafer  reactor  LPCVD  processes,  in  which  the  operating  conditions  are  changed  in 
a  prescribed  manner  during  the  deposition  [1,  2],  That  is,  the  deposition  rate  decreases  during  procelin- 
as  feature  aspect  ratios  increase.  PRCVD  can  be  used  to  decrease  the  deposition  time  for  a  given  final  step 
coverage,  thereby  increasing  throughput,  because  the  initial  deposition  rate  is  much  higher  thin  the  CRCVD 
process  which  yields  the  same  overall  step  coverage.  Recent  experimental  work  confirms  the  applicability  of 
the  PRCVD  concept  to  blanket  tungsten  CVD  using  the  hydrogen  reduction  of  tungsten  hexafluoride  [3j. 
Films  deposited  with  average  rates  which  were  over  a  factor  of  three  higher  than  reference  CRCVD  processes 
had  equivalent  step  coverages.  In  addition,  all  other  PRCVD  film  properties  were  either  as  good  as  or  better 
than  the  properties  of  CRCVD  films. 

In  this  paper,  optimal  control  principles  are  employed,  together  with  a  combination  of  reactor  scale  and 
feature  scale  models,  for  the  purpose  of  maximizing  the  process  throughput,  while  maintaining  a  prescribed 
step  coverage.  Using  a  simplified  feature  scale  model,  optimal  control  inputs  for  the  wafer  are  developed 
in  an  effort  to  minimize  the  processing  time,  subject  to  specified  step  coverage  and  final  film  thickness 
constraints.  We  use  the  processing  temperature  as  the  control  (or  manipulated)  variable  while  keeping  all 
other  conditions  constant  for  the  feature  scale  model.  Our  simulation  results  show  that  the  temperature 
trajectory  obtained  by  solving  the  associated  optimal  control  problem  can  yield  a  significant  reduction  of  the 
processing  time  compared  to  the  CRCVD  approach,  and  some  improvement  relative  to  PRCVD  processes. 

In  our  feature  scale  analysis  and  simulations  we  assume  that  the  species  fluxes  on  the  wafer  surface  remain 
constant.  Such  a  situation  does  not  occur  naturally,  since  the  conversion  level  of  reactant  species  decrease 
as  the  deposition  rate  decrease.  Thus,  in  order  to  achieve  constant  species  fluxes,  we  employ  an  adaptive 
algorithm  on  the  the  reactor  scale  model  to  determine  the  reactor  conditions  for  which,  under  the  desired 
temperature  trajectory,  the  species  partial  pressures  remain  constant.  This  procedure  allows  the  derived 
optimal  processing  conditions  on  the  wafer  surface  to  be  translated  to  optimal  reactor-scale  conditions  that 
can  be  controlled  through  the  available  manipulated  variables  (temperature  and  flowrates).  Future  work  will 
address  issues  related  to  relaxing  the  assumptions  used  for  feature-scale  simulation. 

2  Feature-Scale  Modeling  of  Single  Wafer  Tungsten  LPCVD 

A  rigorous  model  of  transport  and  reaction  inside  micron  scale  features  during  LPCVD  is  the  “ballistic 
transport  and  reaction  model”  (BTRM)  presented  by  Cale  and  coworkers  [4,  5,  6,  7],  In  the  BTRM,  three 
dimensional  transport  of  species  in  free  molecular  flow  and  the  reactions  which  consume  or  generate  species 
are  represented  by  Clausing-like  integral  equations.  One  accepted  method  to  test,  refine  and  validate  models 
for  transport  and  reaction  kinetics,  is  to  use  a  process  simulator  such  as  EVOLVE  [8],  which  solves  the  BTRM 
equations,  to  simulate  film  deposition  using  these  transport  and  reaction  kinetic  submodels.  Simulated  film 
profiles  are  compared  with  experimental  film  profiles;  if  the  comparison  is  good,  then  it  is  likely  that  the 
models  are  satisfactory  for  engineering  applications.  The  state-of-the-art  procedure  is  to  perform  three 
dimensional  (3-D)  transport  and  reaction  simulations  on  3-D  surfaces  which  can  be  represented  in  two 
dimensions  by  cross-sectioning  (trenches  and  vias).  These  3-D/2-D  simulations  can  then  be  compared  with 

SEM  (or  TEM)  cross-section  of  trenches,  lines,  or  vias  with  a  high  degree  of  rigor  for  CVD  processes  (though 
not  without  difficulties).  1 

To  develop  our  control  model,  we  seek  a  model  for  transport  and  reaction  in  features  during  LPCVD 
which  is  easier  to  use  than  the  BTRM.  McConica  and  coworkers  [9,  10]  introduced  the  “diffusion  and 
reaction  model'’  (DRM)  for  free  molecular  flow  and  reaction  in  features.  In  the  DRM,  transport  is  modeled 
as  one  dimensional  Knudsen  diffusion  in  terms  of  local  concentration  gradients,  and  the  deposition  rate  at 
each  lateral  position  in  the  feature  is  determined  by  the  concentration  at  that  position.  Cale  and  coworkers 
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extended  the  DRM  [2,  11],  and  used  it  to  analyze  trends  in  conformality  with  changes  in  deposition  conditions 
(1,  12]  as  well  as  to  demonstrate  the  PRCVD  concept  [1,  2,  12].  Notice  that  the  DRM  is  a  “local”  model, 
because  the  deposition  at  a  point  on  the  surface  is  governed  by  the  concentrations  in  the  vacuum  next  to  that 
point,  and  the  transport  is  governed  by  spatial  derivatives.  On  the  other  hand,  the  BTRM  is  a  “non-local” 
model,  because  depositing  species  can  travel  to  any  point  from  any  other  point  in  the  feature  or  directly 
from  the  source  volume,  depending  upon  geometric  visibility.  It  is  clear  that  the  BTRM  is  a  better  model, 
though  the  DRM  is  easier  to  manipulate  and  provides  reasonable  estimates  for  many  low  pressure  CVD 
processes.  Cale  et  al.  [13]  compared  conformality  predictions  using  simulators  based  upon  the  BTRM  and 
the  DRM,  and  found  that  they  agree  well  as  step  coverage  approaches  unity.  In  this  paper,  we  design  our 
control  scheme  to  maintain  good  conformality;  that  is,  we  keep  gradients  in  concentration  very  small  for  the 
majority  of  the  deposition.  Thus,  until  very  near  closure,  the  DRM  provides  reasonable  estimates  of  step 
coverage.  After  using  the  DRM  to  design  our  control  strategy,  we  use  simulations  based  upon  the  BTRM  to 
validate  the  predicted  protocol. 

As  in  [1,  4,  12],  the  DRM  used  in  this  work  employs  a  coordinate  system  that  moves  during  deposition 
while  the  origin  stays  at  the  center  of  the  feature  mouth.  Considering  idealized  symmetric  features,  such  a 
coordinate  system  for  a  trench  is  shown  in  Fig.  2,  where 

•  X  (t)  is  the  film  thickness  in  the  bottom  of  the  trench  at  time  £; 

•  L(t)  is  the  film  thickness  at  the  mouth  of  the  trench  at  time  t; 

•  Zb  =  H(t)  is  the  instantaneous  trench  depth; 

•  W(Z,  t)  is  the  instantaneous  width  of  the  trench  at  depth  Z; 

•  Wo  is  the  initial  trench  width. 

•  Ho  is  the  initial  trench  depth. 

Of  course,  such  a  deposition  profile  is  a  simplistic  approximation  which  does  not  consume  matter.  However, 
it  is  a  useful  approximation  to  the  results  of  more  detailed  simulations  (see  Fig.  3),  and  is  adequate  for  the 
purpose  of  developing  conditions  for  near-optimal  processing  using  optimal  control  theory. 

In  our  control  model,  we  assume  [1]  that  deposition  occurs  under  conditions  such  that  the  rate  depends 
only  on  the  concentration  of  the  limiting  reactant,  the  feature  is  spatially  isothermal,  and  that  surface 
diffusion  and  radial  or  lateral  gas  concentration  gradients  are  negligible.  We  treat  the  molecular  transport 
as  a  one-dimensional  process  in  which  species  flux  is  expressed  in  terms  of  local  concentration  gradients 
and  Knudsen  diffusion  coefficients.  The  expressions  for  Knudsen  diffusivity  used  in  the  DRM  [2,  11]  are 
based  on  cross-sectional  averages  for  idealized  feature  geometry,  e.g.,  infinitely  long  rectangular  trenches  or 
cylindrical  contact  holes.  Here,  we  only  consider  the  infinitely  long  rectangular  trench  model,  although  the 
same  methodology  is  applicable  to  cylindrical  contact  holes  as  well  [11].  Thus,  the  concentration  of  any 
species  i  at  a  depth  Z  in  the  feature  is  governed  by 


|WZ,«)C,(Z,i)|=A 


A(Z,  t)Di(Z.t) 


dCi(Z,t) 


dZ 


+  P(ZA)J2v^R,(Z,  t) 

j-1 


(1) 


where  C{  is  the  instantaneous,  local  concentration  of  species  i.  A  and  P  denote  the  feature  cross  sectional 
area  available  for  molecular  flow  and  feature  perimeter  at  position  Z  and  time  t,  respectively.  Vj{  is  the 
generalized  stoichiometric  coefficient  of  species  i  in  reaction  j .  Rj{Z ,  t)  is  the  rate  of  the  j- th  heterogeneous 
chemical  reaction  at  position  Z  and  time  t,  Di  is  the  instantaneous,  cross  sectional- averaged,  local  Knudsen 
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Dimensionless  Variable 

Definition 

axial  distance 

i  =  z/H(t)  | 

time 

II 

cs- 

§> 

s 

feature  depth 

H{t)  =  H(t)/H(  0) 

feature  cross  sectional  area 

A(i,r)  =  A(Z,t)/A(  0,0) 

feature  perimeter 

V(i,T)  =  P(Z,t)/P(  0,0) 

feature  width 

W(f,T)  =  W(Z,t)/W(  0,0) 

Knudsen  diffusivity 

V(i,r)  =  Di(Z,t)/Di(  0,0) 

concentration 

ei(i,T)  =  ci(z,t)/ci(o,o) 

rate  of  reaction 

Gj(i,T)  =  Rj(Z,t)/Rj(  0,0) 

solid  density 

/5sfc  =  CSk/CR  o 

Table  1:  Definition  of  Dimensionless  Variables 


diffusivity  of  a  gaseous  species  i. 

Di  [2]: 

Di(Z,t)  = 


For  infinitely  long  rectangular  trenches,  we  use  the  following  estimate  of 


'8KBT(ty 

1/2 

18  +  7a(Z,  t) 

irmi 

4 

_l8  +  l6a(Z,t)  +  2a2(Z,t)_ 

where  KB  is  the  Boltzmann  constant,  T(t)  denotes  the  temperature  at  the  wafer  surface  at  time  t,  mt  is  the 
molecular  mass  of  species  i.  The  instantaneous,  local  aspect  ratio  <x  for  rectangular  trenches  is 


a(Z,  t) 


m 

W(Z,t) 


The  boundary  conditions  for  the  above  second  order  partial  differential  equation  are 


(3) 


C<(0,f) 


Pi(0,  t} 
RgT(t) 


(4) 


and 

Di(Zb,t)^i^A  =  j2vjiRj(Zh>t)  (5) 

where  Rg  is  the  ideal  gas  constant,  Pi(Z,t)  is  the  instantaneous  partial  pressure  of  species  i  at  position  Z 
and  time  t  and  Z  ~  0  is  the  mouth  of  the  feature. 

For  the  purpose  of  identifying  the  important  parameters  that  dictate  the  step  coverage  and  to  determine 
their  dependence  on  the  CVD  chemistry  and  operating  conditions,  the  model  equations  are  nondimension- 
alized  [2].  The  dimensionless  variables  are  listed  in  Table  1.  With  these  definitions,  the  model  equations  in 
dimensionless  form  are: 

Material  balance  for  species  i 


dOii^.r) 


di 


+  >*RiP  (i ,  r )  ^  Vji  Gj  (£,  r )  <pj  (r )  (6) 

j~  1 


Boundary  conditions 


dOjjjbPr) 

di 


.  _  Pi(o,t)r(o)  Q(o,t) 

H  ;  Pi(0,0)T(t)  Ct( 0,0) 


DrqCro  ^4(0.0)  7i(r) 

DaCio  H{0)P(0,0)V(ib,T) 


T)^b(T) 

j  =  l 


(7) 

(8) 
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Initial  conditions 


0i(C.O)  =  l 


(9) 


H{ 0)  -  1 


(10) 


where  0  <  £  <  £b,  r  >  0.  and  &,  =  1 .  <pj  is  referred  to  as  the  step  coverage  modulus  for  the  reaction  j  and  is 
given  by 

,  ,  ^  2H2(0)Rj(0,t) 

^(T)  _  CrCO.O)^,  0)^(0, 0)  (11) 

Finally  XRi  denotes  a  partial  pressure  ratio  and  is  defined  as  XRi  =  The  subscript  “0”  represents 

the  position  £  =  0  and  time  r  =  0,  and  the  subscript  “R”  indicates  the  reference  species. 

For  blanket  tungsten  LPCVD  using  the  hydrogen  reduction  of  tungsten  hexafluoride,  the  reaction  stoi¬ 
chiometry  and  rate  expression  are  [14]: 


and 


3H2  +  WF6  — >  W2  +  6HF 

V  T(t)J  1  +kpPF(Z,t)  ’  \cm2s 


T(t) )  1  +  kppF{Z,t)  ’  \cm2sj  ^ 

where  pF  and  pn  are  the  local  partial  pressures  of  tungsten  hexafluoride  and  hydrogen  in  Torr,  and  T  is  the 
temperature  in  Kelvin.  In  the  following,  the  subscript  F  represents  tungsten  hexafluoride  and  H  represents 
hydrogen. 

For  this  chemistry,  the  dimensionless  species  balances  are  [2]: 


and 


d0F_DF( 0,0)  1  d 

~D~b 

96  h 


f\  .  <pG 

8r  ~  D„(0,0)WH1»(  )  +  ^  “F W 


The  boundary  conditions  are: 


dr 


oH(o  T )  =  phMIM 

H[  '  >  pH(0,0)r(i) 


and 


d6n(^b,T) 


grMD 

pF(0,0)T(t) 

3«W 


2  aQV(^r) 

dOrii^T)  _  PF(0, 0)  ptf(0,0)  H(t) 


dS 


Dh( 0,0)  pF( 0,0)  2 a0'D((b,r) 


4>(t)G{£  b,  t) 

<?(r)G(6»r) 


(13) 

(14) 

(15) 

(16) 

(17) 

(18) 


The  initial  conditions  are: 


W(0)  =  1;  Mf,0)  =  l;  Mfi0)  =  l;  0<f<l  (19) 

where  G  is  the  dimensionless  rate  of  reaction,  (p  is  the  step  coverage  modulus  given  by  equation  (11)  and 
ao  is  initial  aspect  ratio,  given  by  ao  =  Ho/Wo>  is  the  generalized  stoichiometric  coefficient  of  species  i 
(i/f  =  — 15  vH  =  —3),  and  A hf  is 
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to  follow  an  admissible  trajectory  X'(t),  that  minimizing  a  performance  index  of  the  form 

J(U)  =  h(X(tf),tf)+  f  g(X(t),U(t),t)dt  (38) 

J  to 

and  satisfies  the  boundary  conditions  —  Xq  and  X(tj)  =  Xf ,  where  to  is  the  specified  initial  time  and 
tf  is  the  unknown  final  time. 

By  defining  the  Hamiltonian  function 

H(X(t),U(t),P(t),t)  =  9(X(t),U(t),t)  +  PT(t){a(X(t),U(t),t)\  (39) 

the  necessary  conditions  for  optimality  can  be  written  as  [17] 

X'(t)=^(X'(t),U'(t),P*(t),t)  (40) 

(41) 

=  jrinnH{X'{t),U{t),P*{t),t)  (42) 

for  all  t  6  [fo,  £/],  with  boundary  conditions 

X'(t0)  =  Xo  (43) 

X\tf)  =  Xf  (44) 

H(X'(tf),  U*(tf),  P*(tf),  tf)  +  ^(X'(tf),t/)  =  0  (45) 

If  the  costate  P(to)  is  known,  equations  (40)-(42)  could  be  solved  using  numerical  integration.  Since  this 
is  not  the  case,  we  can  guess  values  of  P(0)(fo)  and  tf  for  the  initial  costate  and  use  them  to  numerically 
integrate  (40)-(41)  from  t0  to  tf.  The  observed  values  of  X(tf)  are  then  used  to  systematically  adjust  the 
guessed  values  of  P(t0)  and  tf.  One  technique  for  making  systematic  adjustments  of  the  initial  costate  values 
and  the  final  time  is  based  on  Newton’s  method  for  finding  roots  of  nonlinear  equations.  Thus,  the  following 
equations  are  obtained  for  updating  P(i+1)(f0)  and  f+l). 


Xf-[X(tf)}.  = 


0  = 


dX(tf) 

[dP(t0)\ 

\rntf) 

dX(tf)  '  dX (tf)  dt 
9H(tf)  d  dh(tf) 
dtf  +  dtf  dt 


+ 


{P(i+1)(t0)-pM(to)}  + 
d  dh(tf) 


&X  (tf) 

dtf 


[t 


[*/-[*(*/)]<]  + 


(i+1)  _  ^(i) 


(4i+i)  -  tf) 

■miff)  dpjtf) 
dP{ts)  dP(t0)\ 


(46) 


[P(i+1)(to)-P(i)(fo)]. 


/ 


(47) 


Thus,  the  update  law  is  given  by 


P(i+1)(to)  -  P^(to) 

Ji) 

1/  -  If 


m  tf) 


+ 


Xf-lXitf)^ 


dh(tf) 


dxjtfj  at 


\Xf-\Xitf)], 


pw  = 


PxiP^(t0),tf) 

J"  ax(tf)  I 
dt>  J  i 

' 

[dp(tf)\ 

Vp(p^(t0),tf) 

i  J 

,  d  mtr)  i 

dtf  !  dtf  dt 

i  - 

(48) 

(49) 


8 


The  iteration  is  terminated  when  \\Xf  -  X(tf)\\  <  7  is  satisfied,  where  7  denotes  the  admissible  error. 

In  the  above  equation,  Vx(P^{to),  t)  is  the  n  by  n  matrix  of  partial  derivatives  of  the  components  of 
X(t)  with  respect  to  each  of  the  components  of  P(£0),  evaluated  at  P(i){tQ):  VP{P^(t0),t)  is  the  n  by  n 
matrix  of  partial  derivatives  of  the  components  of  P(t)  with  respect  to  each  of  the  components  of  P(f0), 
evaluated  at  i*e., 


P*(P«  (£„),*)  = 


Pp(pM(t0),t)  = 


9*1  (t) 
dpi(to) 

dxi(t) 

dp2(to) 

9*i  (t) 
9pn(£o) 

9rcn(£) 

dpi(£o) 

9xn(t) 

9p2(to) 

3xn(t) 

9prt(£o) 

dpi  ( t ) 

9pi(t0) 

9pi(t) 

9p2(*o) 

dpi(t) 

9pn.(£o) 

9pn(t) 
dpi  (t0) 

dpn(t) 

9p2(t0) 

dpn(t) 

dpn(to) 

(50) 


(51) 


VP  is  called  the  costate  influence  function  matrix,  and  Vx  is  the  state  influence  function  matrix.  Notice 
that  equation  (49)  requires  that  Vp  and  Vx  are  known  only  at  the  terminal  time  tf.  The  notation  [•]*  means 
that  the  enclosed  terms  are  evaluated  on  the  z-th  trajectory.  Taking  the  partial  derivatives  of  (40)-(41)  with 
respect  to  the  initial  value  of  the  costate  vector  and  assuming  that  ^  and  g p(t0)  are  continuous  with 
respect  to  P(fo)  and  t  so  that  the  order  of  differentiation  can  be  interchanged,  we  obtain 


f  [P*(P«>(<0 ),t)] 


'  d2n 

dPdX 


(t) 


Vx(P(i)(,to),t)  + 


'd2n' 

dP 2 


VP(pM(to),t) 


(52) 


VP(pM(to),t)} 


Vx(P(i)(to),t)  + 


'  o2n 

dXdP 


W(<)  (*>),*) 


(53) 


where  P(l)  obtained  by  integrating  the  reduced  state-costate  equations  with  initial  conditions  X(tQ)  =  X0) 
P(to)  ~  P[l](to)-  The  initial  conditions  for  the  influence  function  equations  are 


Vx(P(i){tQ),t0)  = 


dX(t0) 

dP(to) 


=  0 

pw(t  0) 


Pp(P(i](to),to) 


dPjtp) 

dP(t0) 


=  I 

P<0(to) 


(54) 

(55) 


Equation  (54)  is  derived  by  observing  that  a  change  in  any  of  the  components  of  P(t0)  does  not  affect  the 
value  of  X (to),  since  the  state  values  are  specified  at  time  to.  Equation  (55)  follows  from  the  observation 
that  a  change  in  the  j-th  component  of  P(f0)  changes  only  Pj{t0). 

Returning  to  our  specific  problem,  we  are  interested  in  finding  an  optimal  temperature  trajectory  which 
minimizes  the  processing  time.  Hence,  the  performance  index  is  the  processing  time  while  the  control  is  the 
temperature,  T,  i.e., 

J(T)=f  dt  (56) 

Jto 

Comparing  with  (38),  we  note  that  h(X(tf),tf)  =  0  and  g(X(t),T(t),t)  =  1  and  the  boundary  conditions 
are 


X*(t0)  =  X0 


(57) 
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X'(tf)  =  Xf 

n(x'(tf),T'(tf),p'(tf),tf)  =  o 

The  update  law  for  this  problem  is 


P(i+1)(t0)-P«(t0) 


f(*+l)  _  f(l) 

V 


=  R 


>-i 


I  9X{tf) 


Xf-ixm 


p  _ 


,  [  , 
L  pp(t/)j 


.(*/- 

i 

dX(t<) 


(*(*/)], 


dtf 
dt  f 


(58) 

(59) 

(60) 
(61) 


\Pp(PW(to),tf) 

It  should  be  emphasized  that  the  above  iterative  procedure  of  computing  the  optimal  trajectory  involves 
solutions  of  nonlinear  equations  which  may  not  be  physically  meaningful  at  every  step.  For  example,  espe¬ 
cially  during  the  iterations  when  the  feature  is  near  closure,  a  complex  solution  for  the  instantaneous  step 
coverage  P  may  occur.  Although  a  systematic  way  to  prevent  this  situation  is  feasible  (e.g.,  by  introducing 
an  additional  constraint  m  the  optimal  control  problem),  for  the  sake  of  simplicity  we  adopted  a  different 
approach.  That  is,  we  use  a  similar  strategy  as  the  PRCVD  process  (1,  2)  in  that  we  only  find  the  optimal 
control  during  the  first  leg  to  minimize  the  corresponding  time,  while  for  the  second  leg  the  processing  time 
is  fixed  and  the  temperature  is  held  constant. 


4  Reactor-Scale  Modeling  of  Single  Wafer  Tungsten  LPCVD 

In  this  section,  our  objective  is  to  determine  the  reactor-scale  processing  conditions,  that  yield  the  previ¬ 
ously  developed  desired  trajectories  of  deposition  conditions  at  any  point  on  the  wafer  surface.  To  achieve 
this,  we  consider  a  reactor-scale  model  with  the  manipulated  variables  (control  inputs)  being  the  susceptor 
temperature  T  (K),  total  pressure  p0  ( Torr )  and  flowrates  of  WF6,  H2  and  inert  carrier  gas  (sees).  Here, 
for  simplicity,  we  assume  that  the  control  of  the  temperature  is  sufficiently  good  so  that  the  prescribed 
temperature  trajectory  can  be  realized  with  adequate  accuracy  and  spatial  uniformitv.  The  output  variables 
(or  controlled  outputs)  are  the  partial  pressures  of  WFe  and  H2.  For  control  purposes,  the  modeling  of  such 
a  process  aims  to  establish  relationships  between  the  manipulated  and  output  variables.  Before  any  candi¬ 
date  models  are  proposed,  experimental  design  and  statistical  analysis  are  usually  helpful  in  determining  a 
suitable  model  structure. 

In  the  following  we  discuss  the  use  of  a  simulation  test  bed  rather  than  an  actual  reactor  for  model 
development.  Simulation  data  are  generated  using  a  simulation  platform  called  CFDSWR  [18]  [19],  that 
simulates  an  LPCVD  reactor.  A  full  factorial  experimental  design  is  adopted  and  Yate’s  analysis  [20]  is  ap¬ 
plied  to  the  simulation  data  in  order  to  determine  the  most  significant  relationships  between  the  manipulated 
variables  and  the  output  variables.  In  order  to  develop  such  a  relationship,  we  observe  that  in  a  reaction-free 
environment  the  partial  pressures  pF  and  pH  could  be  determined  by  the  corresponding  mole  fractions  and 
the  total  pressure.  On  the  other  hand,  when  chemical  reactions  occur,  pF  and  pH  at  the  wafer  surface  will 
decrease  due  to  the  consumption  of  the  reactants  during  the  reaction. 

Assuming  that  the  amount  of  reduction  of  each  partial  pressure  is  proportional  to  the  apparent  overall 
reaction  rate  for  the  corresponding  species,  we  employ  multiple  response  surface  techniques  to  approximate 
these  rates  by  an  empirical  expression  of  the  manipulated  variables.  Thus,  we  arrive  at  the  following  empirical 

(but  physically  motivated)  model,  relating  the  manipulated  variables  with  the  reactant  partial  pressures  at 
a  point  on  the  wafer  surface 

pF(t)  =  qu  +  q12T(t)  +  qiSFH(t)  +  g14— (62) 
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PH(t)  =  92i  +  q22T{t)  +  q23FH(t)  +  q24^£^  (63) 

where  qjk  are  model  parameters,  T  is  the  susceptor  temperature,  p0  is  the  total  pressure  and  FH  the  flow  rate 
of  H2.  Notice  that  the  “reaction  feedback”  modeling  idea  is  invoked  in  selecting  the  parametric  structure  of 
the  above  model.  That  is,  the  last  term  in  the  above  equations  describes  the  reaction-free  dependence  of  the 
partial  pressures  on  the  manipulated  variables  while  the  remaining  terms  describe  the  effect  of  the  partial 
pressure  reduction  due  to  the  reactions. 

The  model  parameters  qij  are  then  estimated  via  a  least  squares  algorithm  to  minimize  the  errors  between 
the  partial  pressure  trajectories  of  the  reactants  ( WFq  and  H2)  as  computed  by  the  reactor-scale  simulation 
platform  CFDSWR  [18,  19],  and  the  trajectories  predicted  by  the  above  empirical  models.  Of  course,  in 
practice,  the  parameters  could  be  estimated  using  data  obtained  from  experiments. 


4.1  Recursive  Computation  of  Flowrates  in  the  Tungsten  LPCVD  Reactor 

The  OCCVD  optimal  temperature  trajectory  is  obtained  under  the  assumption  that  the  partial  pressures  of 
WFq  and  H2  on  the  wafer  surface  are  kept  constant.  However,  for  constant  flowrates  and  total  pressure,  the 
time  variation  of  the  optimal  temperature  trajectory  induces  a  time  variation  in  the  partial  pressures  that 
has  a  detrimental  effect  on  the  deposition  profile.  In  order  to  maintain  constant  partial  pressures,  we  need 
to  vary  the  other  two  reactor-scale  manipulated  variables,  namely  the  total  pressure  p0  and  the  /f2- flowrate 
Fh  (the  WFq  flowrate  is  kept  constant).  To  obtain  the  desired  trajectories  for  which  the  partial  pressures  on 
the  wafer  surface  remain  constant  under  the  OCCVD  optimal  temperature  trajectory,  we  employ  Newton’s 
algorithm,  based  on  our  semi-empirical  model.  Such  a  problem  is  formulated  by  letting 


V  =  /(«,  0)  (64) 

denote  the  simplified  model  (62)-(63),  where  y  =  [pFlyH\T,  u  =  \p0,FH\T  and  6  denotes  the  vector  of 
model  parameters,  0  =  [<?n,  <?i2,  <?13, ?2i>  <722,  q2z]T ■  We  assume  that  there  exists  a  vector  6*  for  which  (64) 
is  a  “good”  local  approximation  of  the  actual  process,  i.e,  y  =  f{u,6').  Also,  let  y*  denote  the  desired 
trajectory  to  be  achieved  by  varying  u.  Given  an  initial  guess  for  u,  we  compute  y  using  the  simulation 
platform  (CFDSWR).  Then,  u  is  iteratively  updated  by 


Uk+l  =uk  + 


df_  R(ufcA) 

du 


(y*  -  Vk) 


(65) 


while  8k  is  also  updated  in  order  to  maintain  the  local  fidelity  of  the  approximation  via  an  “indirect  adap- 
tation”  of  the  form 

ek+l  =  ek  +  ^pM[yk  _  f{Ukt  Bk)]  (66) 

until  | y*  -yk |  <  e,  for  a  given  error  threshold  e.  In  the  above  equations,  the  notation  (•)"«  is  used  to  signify 
the  right-inverse  of  the  corresponding  matrix. 

Finally,  it  should  be  mentioned  that  transients  in  the  input  and  parameter  updates  could  cause  u  to 
assume  values  outside  the  domain  of  validity  and/or  convergence  of  simulation  model.  As  a  remedy  of  this 
problem,  projection  techniques  can  be  employed  to  limit  the  range  of  the  inputs  and  the  parameters  ([21]). 
Notice  that,  in  contrast  to  the  input  constraints,  the  parameter  constraints  may  be  difficult  to  determine. 
For  this  purpose,  the  approach  of  [22]  can  be  used  to  obtain  an  initial  estimate  and  update  the  parameter 
uncertainty  set. 
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5  Results  and  Discussion 


Here  we  consider  tungsten  deposition  by  hydrogen  reduction  of  tungsten  hexafluoride  on  patterned  wafers, 
with  0.7x3 fim  trenches,  modeled  as  described  in  previous  sections.  For  a  desired  step  coverage  of  SC  =  99.5% 
and  constant  partial  pressures  on  the  wafer  surface  (pp  =  0.125  Torr,  pn  =  1.875  Torr),  under  the  optimal 
control  CVD  (OCCVD)  approach  we  select  the  time  of  the  second  leg  as  11  seconds.  This  value  is  somewhat 
arbitrary  and  could  be  selected  more  systematically  by  performing  a  one-parameter  optimization  with  respect 
to  the  time  of  the  second  leg  time.  For  the  sake  of  simplicity,  this  approach  is  not  adopted  in  the  present 
study.  With  the  temperature  held  constant  in  the  second  leg,  equal  to  the  final  temperature  of  the  first  leg 
in  the  previous  iteration,  (40)- (41)  are  integrated  backwards  in  time  to  obtain  the  final  state  X(tf)  at  the 
end  of  the  first  leg.  This  final  state  is  then  substituted  in  (60)  to  yield  the  next  estimate  for  P(t0)  and  tf . 
These  values  are,  in  turn,  used  to  integrate  the  state  equations  (40)- (41)  forward  in  time  with  the  input  U 
(temperature)  computed  from  (42).  The  procedure  is  repeated  until  convergence  is  achieved.  The  results 
of  the  OCCVD  approach  are  shown  in  Fig.  3.  The  first  leg  time  is  14  seconds  and  the  total  time,  for  98 
percent  closure,  is  25  seconds.  Under  the  same  partial  pressure  conditions,  the  temperature  trajectories  for 
OCCVD  and  CRCVD  approaches  are  shown  in  Fig.  4.  Compared  to  the  CRCVD  approach,  OCCVD  yields 
a  34  percent  reduction  of  processing  time. 

The  above  results  demonstrate  that  significant  savings  in  processing  time  can  be  obtained,  without  com¬ 
promising  the  step  coverage  constraint,  by  using  optimal  control  theory  to  compute  temperature  trajectories 
for  a  CVD  process.  The  price  paid  for  these  improvements  is  the  increased  computational  complexity  of  the 
solution  and  the  difficulty  in  implementing  the  optimal  trajectories  on  the  actual  process.  Without  attempt¬ 
ing  to  completely  resolve  these  issues  here,  we  note  that  the  former  is  of  lesser  significance,  especially  in  view 
of  the  computational  power  of  modern  computers. 

On  the  other  hand,  the  implementation  of  the  optimal  trajectories  may  give  rise  to  a  more  classical 
control  problem,  whose  solution  may  be  iterated  together  with  the  computation  of  the  optimal  trajectory. 
Consider,  for  example,  the  same  CVD  process  and  suppose  that  the  wafer  temperature  is  adjusted  by  a 
local-  temperature  controller.  The  formulation  of  a  similar  optimal  control  problem  for  this  case  would 
use  the  desired  wafer  temperature  (reference  input  or  set-point  in  the  local  temperature  controller)  as  the 
manipulated  (control)  variable.  Assuming  a  simple  (e.g.,  first  or  second  order  model)  dynamic  description  of 
the  relationship  between  the  desired  to  actual  wafer  temperature,  we  could  then  proceed  to  find  the  optimal 
desired  and  actual  temperature  trajectories  as  before.  Following  this  step,  we  can  then  design  a  temperature 
controller  whose  objective  is  to  achieve  the  assumed  dynamic  description  between  the  optimal  reference 
trajectory  and  the  actual  wafer  temperature,  in  the  presence  of  modeling  errors  and/or  disturbances.  The 
essence  of  this  description  is  to  introduce  a  constraint  on  the  speed  of  variation  of  the  actual  wafer  temperature 
in  the  optimal  control  problem  formulation.  Another  possibility  would  be  to  introduce  constraints  on  the 
derivative  of  the  temperature.  The  selection  of  the  simple  model  should  be  such  that  its  bandwidth  (and, 
consequently,  that  of  the  actual  wafer  temperature)  is  within  the  bandwidth  of  the  local  temperature  closed- 
loop.  Thus,  although  the  exact  relationship  between  the  desired  and  the  actual  wafer  temperatures  is  very 
complicated  or  even  unknown,  the  proper  use  of  the  local  feedback  controller  can  ensure  the  successful 
implementation  of  the  optimal  temperature  trajectory. 

Next,  for  the  implementation  of  the  OCCVD  approach  .discussed  here,  the  partial  pressures  on  the 
wafer  surface  must  be  maintained  constant,  as  assumed  during  the  solution  of  the  optimal  control  problem. 
Solving  the  corresponding  inverse  problem  for  the  reactor  scale  model,  we  compute  the  desired  trajectories 
for  the  manipulated  variables  po  and  Fjj  that  correspond  to  the  desired  partial  pressures  and  optimal 
temperature  trajectories  (Fig.  5  and  Fig.  6).  As  expected,  the  simulation  of  the  reactor-scale  model  with 


12 


these  manipulated  variable  trajectories  yields  nearly  constant  partial  pressures  on  the  wafer  surface  (Fig.  7 
and  8).  The  resulting  deposition  profiles,  simulated  with  EVOLVE,  are  shown  in  Fig.  9,  for  the  uncontrolled 
case,  and  Fig.  10  for  the  controlled  case.  Notice  that,  although  for  this  problem  the  differences  in  step 
coverage  are  relatively  small  (99.7%  and  99.1%  for  the  controlled  and  uncontrolled  cases,  respectively),  a 
significant  deterioration  in  the  step  coverage  and  trench  closure  may  be  observed  in  general,  if  the  reactor- 
scale  control  is  not  implemented. 

Finally,  a  more  subtle  problem  arises  from  the  need  to  establish  such  an  optimal  trajectory  generation 
procedure  as  a  practical  alternative  to  the  current  approaches.  Due  to  its  complexity,  the  above  procedure 
should  be  “automated”  to  a  great  extension  in  order  to  become  usable  by  process  engineers  (e.g.,  provide  the 
ability  of  easily  generating  a  new  optimal  trajectory  when  the  processing  conditions  change).  Also  notice  that 
the  proposed  approach  is  “open  loop”  and  requires  relatively  accurate  models  for  its  successful  application. 
Although  a  quick  remedy  is  to  use  on-  or  off-line  adaptive  techniques  to  ensure  the  accuracy  of  the  model, 
an  attractive  alternative  would  be  a  closed  loop  design  using  other  measurable  quantities  (e.g.,  deposition 
rates,  outlet  concentrations)  from  which  the  partial  pressures  of  H2  and  WFq  at  the  wafer  can  be  inferred. 
These  issues,  however,  extend  beyond  the  scope  of  this  study  and  are  left  as  a  topic  of  future  research. 
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Figure  1:  Schematic  diagram  of  the  Single  Wafer  LPCVD  reactor 


Figure  2:  Idealized  cross  section  of  a  feature  during  deposition. 
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OCCVD  Deposition  Profile 


trench  width  (Microns) 


Figure  3:  Deposition  profiles  under  OCCVD  (solid:  full  feature-scale  model  (EVOLVE);  dashed:  simplified 
model) . 


Figure  4:  Temperature  trajectory  comparison  under  CRCVD  and  OCCVD. 
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Figure  7:  Comparison  of  the  resulting  partial  pressure  trajectories  for  WFq  at  the  wafer  surface  (constant 
versus  controlled  total  pressure  and  H2  flowrate). 
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Figure  8:  Comparison  of  the  resulting  partial  pressure  trajectories  for  H2  at  the  wafer  surface  (constant 
versus  controlled  total  pressure  and  H2  flowrate). 
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Controlled  Ph2  &Pwf6,  -  Uncontrolled  Ph2&Pwf6 


Figure  9;  Comparison  of  resulting  deposition  profiles  (EVOLVE  simulation):  Constant  versus  Controlled 
Reactor  variables  (Total  Pressure  and  H2  Inlet  Flowrate) 


Constant  Ph2  &Pwf6,  **  Controlled  Ph2&Pwf6 


Figure  10:  Comparison  of  resulting  deposition  profiles  (EVOLVE  simulation):  Constant  partial  pressures 
versus  resulting  partial  pressures  (Controlled  by  Reactor  variables  Total  Pressure  and  H2  Inlet  Flowrate) 


